Compiling to Assembly
from Scratch

Table of Contents


Chapter 4
Abstract Syntax Tree

Abstract syntax tree, or AST, is the central concept in compilers.

AST is a data-structure. It’s a tree that models the syntax of a programming language. But it abstracts away from the mundane details of syntax, such as the exact placement of parenthesis or semicolons.

This tree consists of nodes, where each node is a data object that represents a syntactic construct in the language. A Return node could represent a return statement, an Add node can represent a + operator, an identifier referring to a variable could use an Id node, and so on.

For example, the following line of code:

return n * factorial(n - 1);

Can have an AST like this:

new Return(
  new Multiply(
    new Id("n"),
    new Call("factorial", [
      new Subtract(new Id("n"), new Number(1)),
    ])))

In the next figure you can see a graphical representation of the same tree.

AST corresponding to n * factorial(n - 1)

Why use an AST? When working with a language construct, an AST makes it convenient to operate on it: to query, construct, and modify.

We design an AST so that it is convenient for us, depending on what we do with it: what kinds of transformations we want to make, which things to query or change.

For example, an AST can include source location information for error reporting. Or it can include documentation comments if our compiler needs to deal with those. Or it can include all comments and some notion of whitespace, if we want to have style-preserving transformations, like automatic refactoring.

Well, actually…

There are also concrete syntax trees, also called parse trees. They reflect the structure and hierarchy down to each input symbol. They usually have too many details that we don’t care about when writing a compiler. But they are a good match for style-preserving transformations.

We will represent each kind of node in our AST as a separate class: Return, Multiply, Id, etc. However, at the type level, we want to be able to refer to “any AST node”. For that TypeScript gives us several tools:

Either of these works. We will use an interface:

interface AST {
  equals(other: AST): boolean;
}

Each type of AST node will be a class that implements the AST interface. It starts simple, with equals as the only method. We will add methods to this interface (and the classes) as needed.

The equals method is mostly useful for unit-testing. The implementation is quite mundane, so for the most part, we will omit it and replace its body with an ellipsis.

Number node

Our first node is Number.

class Number implements AST {
  constructor(public value: number) {}

  equals(other: AST) {…}
}

Here we used the TypeScript shortcut for quickly defining instance variables using public for the constructor parameter. Remember that it is equivalent to the following:

class Number implements AST {
  public value: number;

  constructor(value: number) {
    this.value = value;
  }

  equals(other: AST) {…}
}

It saves us quite some typing, which will be useful because we need to define many types of AST nodes.

We called our AST node Number because the data type in JavaScript and TypeScript is called number. However, our compiler will handle only unsigned integers.

Examples of the Number node
Source AST
0 new Number(0)
42 new Number(42)

Identifier node

Identifiers, or ids for short, refer to variables in the code.

class Id implements AST {
  constructor(public value: string) {}

  equals(other: AST) {…}
}
Examples of the Id node
Source AST
x new Id("x")
hello new Id("hello")

Operator nodes

The next AST node is Not, which stands for the logical negation operator (!).

class Not implements AST {
  constructor(public term: AST) {}

  equals(other: AST) {…}
}
Examples of the Not node
Source AST
!x new Not(new Id("x"))
!42 new Not(new Number(42))

Negation is the only prefix operator (or unary operator) that we define. However, we define several infix operators (or binary operators).

Equality operator (==) and its opposite (!=).

class Equal implements AST {
  constructor(public left: AST, public right: AST) {}

  equals(other: AST) {…}
}
class NotEqual implements AST {
  constructor(public left: AST, public right: AST) {}

  equals(other: AST) {…}
}
Examples of Equal and NotEqual nodes
Source AST
x == y new Equal(new Id("x"), new Id("y"))
10 != 25 new NotEqual(new Number(10), new Number(25))

Addition (+), subtraction (-), multiplication (*), and division (/):

class Add implements AST {
  constructor(public left: AST, public right: AST) {}

  equals(other: AST) {…}
}
class Subtract implements AST {
  constructor(public left: AST, public right: AST) {}

  equals(other: AST) {…}
}
class Multiply implements AST {
  constructor(public left: AST, public right: AST) {}

  equals(other: AST) {…}
}
class Divide implements AST {
  constructor(public left: AST, public right: AST) {}

  equals(other: AST) {…}
}
Examples of Add and Multiply nodes
Source AST
x + y
          new Add(new Id("x"),
                  new Id("y"))
10 * 25
  new Multiply(new Number(10),
               new Number(25))

Note that since the parameters are ASTs themselves, that means they can be arbitrarily nested.

For example, 42 + !(20 != 10) will be:

new Add(new Number(42),
        new Not(new NotEqual(new Number(20),
                             new Number(10))))

Not all combinations of AST nodes make sense. Nonetheless, the one above happens to be valid in JavaScript.

Call node

Call refers to a function name (or callee), and an array of arguments. For example, f(x) becomes: new Call("f", [new Id("x")]).

class Call implements AST {
  constructor(public callee: string,
              public args: Array<AST>) {}

  equals(other: AST) {
    return other instanceof Call &&
      this.callee === other.callee &&
      this.args.length === other.args.length &&
      this.args.every((arg, i) =>
        arg.equals(other.args[i]));
  }
}

The language of our baseline compiler is restricted such that only named functions can be called.

We can’t name the arguments array arguments, since this clashes with JavaScript built-in arguments object. So, with some reluctance, let’s call it args.

The Call node is interesting because it has both a primitive string and an array of AST as its members. JavaScript doesn’t have an agreed-upon protocol for equality; that’s why Call makes an excellent example of how to implement the equals method in JavaScript:

Languages other than JavaScript often have more elegant ways of dealing with structural equality.

Examples of the Call node
Source AST
f(x, y)
  new Call("f", [
    new Id("x"),
    new Id("y"),
  ])
factorial(n - 1)
  new Call("factorial", [
    new Subtract(new Id("n"),
                 new Number(1)),
  ])

Return node

class Return implements AST {
  constructor(public term: AST) {}

  equals(other: AST) {…}
}
Examples of the Return node
Source AST
return 0;
new Return(new Number(0))
return n - 1;
new Return(
  new Subtract(new Id("n"),
               new Number(1)))

Block node

Block refers to a block of code delimited with curly braces.

class Block implements AST {
  constructor(public statements: Array<AST>) {}

  equals(other: AST) {…}
}
Examples of the Block node
Source AST
{}
new Block([])
{
  f(x);
  return y;
}
new Block([
  new Call("f", [new Id("x")]),
  new Return(new Id("y")),
])

If node

The If node has three branches:

class If implements AST {
  constructor(public conditional: AST,
              public consequence: AST,
              public alternative: AST) {}

  equals(other: AST) {…}
}

This way, the following:

if (x == 42) {
  return x;
} else {
  return y;
}

Becomes:

new If(
  new Equal(new Id("x", new Number(42))),
  new Block([new Return(new Id("x"))]),
  new Block([new Return(new Id("y"))]),
)

Curly braces are optional for if statements, thus, the following:

if (x == 42)
  return x;
else
  return y;

Becomes:

new If(
  new Equal(new Id("x", new Number(42))),
  new Return(new Id("x")),
  new Return(new Id("y")),
)

How do we represent an if without the else branch? We could have a separate node for it, or we can do a simple trick: representing if (x) y the same way as if (x) y else {}. In other words, by placing an empty Block as the alternative.

Function definition node

A function definition consists of a function name, an array of parameters, and the function’s body.

class Function implements AST {
  constructor(public name: string,
              public parameters: Array<string>,
              public body: AST) {}

  equals(other: AST) {…}
}

Consider the following function definition.

function f(x, y) {
  return y;
}

When converted to an AST it becomes as follows.

new Function("f", ["x", "y"], new Block([
  new Return(new Id("y")),
]))

Notice that in a function definition, the parameters are strings, while in a function call, they are ASTs. This fact reflects that function calls can have nested expressions, while function definitions simply list the inbound variable names.

Variable declaration node

Var nodes are for variable declarations. So var x = 42; becomes new Var("x", new Number(42)).

class Var implements AST {
  constructor(public name: string, public value: AST) {}

  equals(other: AST) {…}
}

Assignment node

An assignment is represented with the node Assign.

class Assign implements AST {
  constructor(public name: string,
              public value: AST) {}

  equals(other: AST) {…}
}

Assignment differs from variable declaration in the following way: assignment changes the value of an existing variable, and does not define a new one. At least, that’s the distinction that we will assume. JavaScript allows assignment of a variable that is not defined yet; in such case, it will create a global variable. TypeScript, on the other hand, dissallows this error-prone behavior.

Examples of the Assign node
Source AST
x = 42;
new Assign("x", new Number(42))
y = a + b;
new Assign("y",
           new Add(new Id("a"),
                   new Id("b"))))

While loop node

The last node of our AST and the last construct in our baseline language is the while loop.

class While implements AST {
  constructor(public conditional: AST,
              public body: AST) {}

  equals(other: AST) {…}
}
Examples of the While node
Source AST
while (x)
  f();
new While(
  new Id("x"),
  new Call("f", []))
while (x) {
  f();
}
new While(new Id("x"), new Block([
  new Call("f", []),
]))

Summary

Let’s look at a larger snippet converted to an AST. Remember our factorial function:

function factorial(n) {
  var result = 1;
  while (n != 1) {
    result = result * n;
    n = n - 1;
  }
  return result;
}

And here is the corresponding AST:

new Function("factorial", ["n"], new Block([
  new Var("result", new Number(1)),
  new While(new NotEqual(new Id("n"),
                         new Number(1)), new Block([
    new Assign("result", new Multiply(new Id("result"),
                                      new Id("n"))),
    new Assign("n", new Subtract(new Id("n"),
                               new Number(1))),
  ])),
  new Return(new Id("result")),
]))

We finish this chapter with a table that summarizes all the AST constructors that we’ve covered, their signatures, and examples of what source code these AST nodes can represent.

Summary of AST constructor signatures with examples
AST Constructor Signature Example
Number(value: number)
Id(value: string)
Not(term: AST)
Equal(left: AST, right: AST)
NotEqual(left: AST, right: AST)
Add(left: AST, right: AST)
Subtract(left: AST, right: AST)
Multiply(left: AST, right: AST)
Divide(left: AST, right: AST)
42
x
!term
left == right
left != right
left + right
left - right
left * right
left / right
Call(callee: string,
     args: Array<AST>)
callee(a1, a2, a3)
Return(term: AST)
return term;
Block(statements: Array<AST>)
{ s1; s2; s3; }
If(conditional: AST,
   consequence: AST,
   alternative: AST)
if (conditional)
  consequence
else
  alternative
Function(
  name: string,
  parameters: Array<string>,
  body: AST,
)
function name(p1) {
  body;
}
Var(name: string,
    value: AST)
var name = value;
Assignment(name: string,
           value: AST)
name = value;
While(conditional: AST,
      body: AST)
while (conditional)
  body;

In the next two chapters, you will learn about converting a program from source to an AST, or in other words, about parsing.

Next: Chapter 5. Parser Combinators