Abstract syntax tree, or AST, is the central concept in compilers.
AST is a data-structure. It’s a tree that models the syntax of a programming language. But it abstracts away from the mundane details of syntax, such as the exact placement of parenthesis or semicolons.
This tree consists of nodes, where each node is a data object that represents a syntactic construct in the language. A Return
node could represent a return
statement, an Add
node can represent a +
operator, an identifier referring to a variable could use an Id
node, and so on.
For example, the following line of code:
return n * factorial(n - 1);
Can have an AST like this:
new Return(
new Multiply(
new Id("n"),
new Call("factorial", [
new Subtract(new Id("n"), new Number(1)),
])))
In the next figure you can see a graphical representation of the same tree.
Why use an AST? When working with a language construct, an AST makes it convenient to operate on it: to query, construct, and modify.
We design an AST so that it is convenient for us, depending on what we do with it: what kinds of transformations we want to make, which things to query or change.
For example, an AST can include source location information for error reporting. Or it can include documentation comments if our compiler needs to deal with those. Or it can include all comments and some notion of whitespace, if we want to have style-preserving transformations, like automatic refactoring.
Well, actually…
There are also concrete syntax trees, also called parse trees. They reflect the structure and hierarchy down to each input symbol. They usually have too many details that we don’t care about when writing a compiler. But they are a good match for style-preserving transformations.
We will represent each kind of node in our AST as a separate class: Return
, Multiply
, Id
, etc. However, at the type level, we want to be able to refer to “any AST node”. For that TypeScript gives us several tools:
Either of these works. We will use an interface:
interface AST {
equals(other: AST): boolean;
}
Each type of AST node will be a class that implements the AST
interface. It starts simple, with equals
as the only method. We will add methods to this interface (and the classes) as needed.
The equals
method is mostly useful for unit-testing. The implementation is quite mundane, so for the most part, we will omit it and replace its body with an ellipsis.
Our first node is Number
.
class Number implements AST {
constructor(public value: number) {}
equals(other: AST) {…}
}
Here we used the TypeScript shortcut for quickly defining instance variables using public
for the constructor parameter. Remember that it is equivalent to the following:
class Number implements AST {
public value: number;
constructor(value: number) {
this.value = value;
}
equals(other: AST) {…}
}
It saves us quite some typing, which will be useful because we need to define many types of AST nodes.
We called our AST node Number
because the data type in JavaScript and TypeScript is called number
. However, our compiler will handle only unsigned integers.
Source | AST |
---|---|
0 |
new Number(0) |
42 |
new Number(42) |
Identifiers, or ids for short, refer to variables in the code.
class Id implements AST {
constructor(public value: string) {}
equals(other: AST) {…}
}
Source | AST |
---|---|
x |
new Id("x") |
hello |
new Id("hello") |
The next AST node is Not
, which stands for the logical negation operator (!
).
class Not implements AST {
constructor(public term: AST) {}
equals(other: AST) {…}
}
Source | AST |
---|---|
!x |
new Not(new Id("x")) |
!42 |
new Not(new Number(42)) |
Negation is the only prefix operator (or unary operator) that we define. However, we define several infix operators (or binary operators).
Equality operator (==
) and its opposite (!=
).
class Equal implements AST {
constructor(public left: AST, public right: AST) {}
equals(other: AST) {…}
}
class NotEqual implements AST {
constructor(public left: AST, public right: AST) {}
equals(other: AST) {…}
}
Source | AST |
---|---|
x == y |
new Equal(new Id("x"), new Id("y")) |
10 != 25 |
new NotEqual(new Number(10), new Number(25)) |
Addition (+
), subtraction (-
), multiplication (*
), and division (/
):
class Add implements AST {
constructor(public left: AST, public right: AST) {}
equals(other: AST) {…}
}
class Subtract implements AST {
constructor(public left: AST, public right: AST) {}
equals(other: AST) {…}
}
class Multiply implements AST {
constructor(public left: AST, public right: AST) {}
equals(other: AST) {…}
}
class Divide implements AST {
constructor(public left: AST, public right: AST) {}
equals(other: AST) {…}
}
Source | AST |
---|---|
x + y |
|
10 * 25 |
|
Note that since the parameters are ASTs themselves, that means they can be arbitrarily nested.
For example, 42 + !(20 != 10)
will be:
new Add(new Number(42),
new Not(new NotEqual(new Number(20),
new Number(10))))
Not all combinations of AST nodes make sense. Nonetheless, the one above happens to be valid in JavaScript.
Call
refers to a function name (or callee), and an array of arguments. For example, f(x)
becomes: new Call("f", [new Id("x")])
.
class Call implements AST {
constructor(public callee: string,
public args: Array<AST>) {}
equals(other: AST) {
return other instanceof Call &&
this.callee === other.callee &&
this.args.length === other.args.length &&
this.args.every((arg, i) =>
.equals(other.args[i]));
arg
} }
The language of our baseline compiler is restricted such that only named functions can be called.
We can’t name the arguments array arguments
, since this clashes with JavaScript built-in arguments
object. So, with some reluctance, let’s call it args
.
The Call
node is interesting because it has both a primitive string and an array of AST as its members. JavaScript doesn’t have an agreed-upon protocol for equality; that’s why Call
makes an excellent example of how to implement the equals
method in JavaScript:
instanceof
operator to check that the other AST is also a Call
.callee
strings using the ===
operator..equals
method for comparing AST nodes of each argument.every
element.Languages other than JavaScript often have more elegant ways of dealing with structural equality.
Source | AST |
---|---|
f(x, y) |
|
factorial(n - 1) |
|
class Return implements AST {
constructor(public term: AST) {}
equals(other: AST) {…}
}
Source | AST |
---|---|
return 0; |
|
return n - 1; |
|
Block
refers to a block of code delimited with curly braces.
class Block implements AST {
constructor(public statements: Array<AST>) {}
equals(other: AST) {…}
}
Source | AST |
---|---|
|
|
|
|
The If
node has three branches:
conditional
refers to the expression that is evaluated to either true or false,consequence
is the branch taken in the true case, andalternative
is the branch taken in the false case.class If implements AST {
constructor(public conditional: AST,
public consequence: AST,
public alternative: AST) {}
equals(other: AST) {…}
}
This way, the following:
if (x == 42) {
return x;
else {
} return y;
}
Becomes:
new If(
new Equal(new Id("x", new Number(42))),
new Block([new Return(new Id("x"))]),
new Block([new Return(new Id("y"))]),
)
Curly braces are optional for if statements, thus, the following:
if (x == 42)
return x;
else
return y;
Becomes:
new If(
new Equal(new Id("x", new Number(42))),
new Return(new Id("x")),
new Return(new Id("y")),
)
How do we represent an if
without the else
branch? We could have a separate node for it, or we can do a simple trick: representing if (x) y
the same way as if (x) y else {}
. In other words, by placing an empty Block
as the alternative
.
A function definition consists of a function name, an array of parameters, and the function’s body.
class Function implements AST {
constructor(public name: string,
public parameters: Array<string>,
public body: AST) {}
equals(other: AST) {…}
}
Consider the following function definition.
function f(x, y) {
return y;
}
When converted to an AST it becomes as follows.
new Function("f", ["x", "y"], new Block([
new Return(new Id("y")),
]))
Notice that in a function definition, the parameters are strings, while in a function call, they are ASTs. This fact reflects that function calls can have nested expressions, while function definitions simply list the inbound variable names.
Var
nodes are for variable declarations. So var x = 42;
becomes new Var("x", new Number(42))
.
class Var implements AST {
constructor(public name: string, public value: AST) {}
equals(other: AST) {…}
}
An assignment is represented with the node Assign
.
class Assign implements AST {
constructor(public name: string,
public value: AST) {}
equals(other: AST) {…}
}
Assignment differs from variable declaration in the following way: assignment changes the value of an existing variable, and does not define a new one. At least, that’s the distinction that we will assume. JavaScript allows assignment of a variable that is not defined yet; in such case, it will create a global variable. TypeScript, on the other hand, dissallows this error-prone behavior.
Source | AST |
---|---|
x = 42; |
|
y = a + b; |
|
The last node of our AST and the last construct in our baseline language is the while loop.
class While implements AST {
constructor(public conditional: AST,
public body: AST) {}
equals(other: AST) {…}
}
Source | AST |
---|---|
|
|
|
|
Let’s look at a larger snippet converted to an AST. Remember our factorial function:
function factorial(n) {
var result = 1;
while (n != 1) {
= result * n;
result = n - 1;
n
}return result;
}
And here is the corresponding AST:
new Function("factorial", ["n"], new Block([
new Var("result", new Number(1)),
new While(new NotEqual(new Id("n"),
new Number(1)), new Block([
new Assign("result", new Multiply(new Id("result"),
new Id("n"))),
new Assign("n", new Subtract(new Id("n"),
new Number(1))),
,
]))new Return(new Id("result")),
]))
We finish this chapter with a table that summarizes all the AST constructors that we’ve covered, their signatures, and examples of what source code these AST nodes can represent.
AST Constructor Signature | Example |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
In the next two chapters, you will learn about converting a program from source to an AST, or in other words, about parsing.