13. Static Type Checking and Inference • Compiling to Assembly from Scratch

In this chapter, we will implement a type checker for our language with local type inference. Local type inference means that local variables do not need a type annotation, so you can write the following:

let a = [1, 2, 3];

Our type checker will cover such types as number, boolean, void, and Array<T> where T could be any other type, for example Array<Array<boolean>>. We will also deal with function types, such as (x: boolean, y: number) => number, when type checking function calls.

First, we need to refer to different types to manipulate them. We will represent them similarly to AST nodes, with an interface called Type and one class that implements it per type.

interface Type {
  equals(other: Type): boolean;
  toString(): string;
}

We will need equality and a toString method to be able to display type errors.

Summary of Type constructor signatures with examples
Type Constructor Signature	Example
`BooleanType()`	`boolean`

`NumberType()`	`number`

`VoidType()`	`void`

`ArrayType(element: Type)`	`Array<number>`

`FunctionType( parameters: Map<string, Type>, returnType: Type, )`	`(x: boolean, y: number) => number`

Implementing toString and equals for these data classes is straightforward. However, FunctionType requires some comments:

Our language will change to allow functions to be type-annotated. We will first change our Function AST node to store its signature as FunctionType, instead of just parameter names:

class Function implements AST {
  constructor(public name: string,
              public signature: FunctionType,
              public body: AST) {}

  visit<T>(v: Visitor<T>) {…}

  equals(other: AST): boolean {…}
}

We will also need to change our grammar and parser. After introducing the necessary tokens, we can define the following grammar for types.

arrayType <- ARRAY LESS_THAN type GREATER_THAN
type <- VOID | BOOLEAN | NUMBER | arrayType

Next, we need to change the grammar and parser for functions, to include optional type annotations:

We allow optional type annotations for function parameters and function return type. Why optional? Some languages do infer parameter and return types, but we do it so we can use the same parser unmodified later with dynamic types. If a type annotation is missing, we default it to number, which also gives us some backwards compatibility with our test suite.

Now, the parser will accept functions with type annotations, like the following:

function pair(x: number, y: number): Array<number> {
  return [x, y];
}

For such function, the parser will produce a Function node with FunctionType signature like this:

new Function(
  "pair",
  new FunctionType(
    new Map([["x", new NumberType()],
             ["y", new NumberType()]]),
    new ArrayType(new NumberType()),
  ),
  new Block([
    new Return(new ArrayLiteral([new Id("x"),
                                 new Id("y")])),
  ]),
)

Next, for type checking, we will use a function that asserts that two types are the same and otherwise raises an exception to signal an error. We call it assertType:

function assertType(expected: Type, got: Type): void {
  if (!expected.equals(got)) {
    throw TypeError(
      `Expected ${expected}, but got ${got}`);
  }
}

It uses both the equals method and the toString method, which is invoked implicitly for template string variables. Here, we are slightly abusing the built-in TypeError exception, which has a different purpose (run-time type errors); it is better to define a custom exception type.

Our TypeChecker pass will walk the AST and will either abort with a TypeError, or will return the inferred Type of an expression. Thus, TypeChecker implements Visitor<Type>.

class TypeChecker implements Visitor<Type> {
  constructor(
    public locals: Map<string, Type>,
    public functions: Map<string, FunctionType>,
    public currentFunctionReturnType: Type | null,
  ) {}

  …
}

We need two separate environments for these since functions are not first-class in our language: they cannot be assigned to a variable and passed around. So, when we encounter a Call, we will search for a function in the functions map, and when we encounter an Id, we will search for a non-function type in the locals map.

We also maintain an instance variable currentFunctionReturnType which will help us type-check Return statements.

Scalars

The type of literal scalars is the easiest to infer. We know that the type of a number literal is number, the type of a boolean node is boolean, and so forth.

  visitNumber(node: Number) {
    return new NumberType();
  }
  
  visitBoolean(node: Boolean) {
    return new BooleanType();
  }

In TypeScript the void type is inferred for expressions or statements that return undefined.

  visitUndefined(node: Undefined) {
    return new VoidType();
  }

Operators

Now, let’s look at the most straightforward operator—negation. That’s not exactly how it works in TypeScript, but let’s assume that the negation operator expects strictly boolean parameter. To enforce that, we first infer the inner term’s type by calling the visit method on it. Then we assert that it is boolean:

  visitNot(node: Not) {
    assertType(new BooleanType(), node.term.visit(this));
    return new BooleanType();
  }

To type-check numeric operators like Add, we infer the type of left and right parameters and assert that they are both numbers. Then we return number as the resulting type.

  visitAdd(node: Add) {
    assertType(new NumberType(), node.left.visit(this));
    assertType(new NumberType(), node.right.visit(this));
    return new NumberType();
  }

Now we get to something more interesting. Operations such as equality are generic: they can work with any type as long as the type on the left-hand side is the same as the one on the right-hand side. We infer the types on both sides by visiting them, and then assert that the two are the same.

  visitEqual(node: Equal) {
    let leftType = node.left.visit(this);
    let rightType = node.right.visit(this);
    assertType(leftType, rightType);
    return new BooleanType();
  }

Variables

For a new variable defined with var, we infer its type by visiting its value, and then we add that type to the locals environment.

  visitVar(node: Var) {
    let type = node.value.visit(this);
    this.locals.set(node.name, type);
    return new VoidType();
  }

When we encounter a variable, we look it up in the locals environment and return its type. If the variable is not defined—we raise a TypeError.

  visitId(node: Id) {
    let type = this.locals.get(node.value);
    if (!type) {
      throw TypeError(`Undefined variable ${node.value}`);
    }
    return type;
  }

When assigning a new value to a variable, we check that the variable was previously defined, and that the assignment does not change the type of the variable.

  visitAssign(node: Assign) {
    let variableType = this.locals.get(node.name);
    if (!variableType) {
      throw TypeError(`Undefined variable ${node.name}`);
    }
    let valueType = node.value.visit(this);
    assertType(variableType, valueType);
    return new VoidType();
  }

Arrays

Inferring the type of array literals is a bit trickier than the other literals that we’ve covered. We know that any array literal is of type Array<T>, but we need to figure out what T is. First of all, we can’t infer the type of an empty array—there’s simply not enough information at this point. There are bi-directional type inference algorithms that handle this, but often this is solved by requiring a type annotation.

If the array is non-empty, we infer the type of each element and then assert pair-wise that they are the same type.

  visitArrayLiteral(node: ArrayLiteral): Type {
    if (node.args.length == 0) {
      throw TypeError("Can't infer type of empty array");
    }
    let argsTypes =
      node.args.map((arg) => arg.visit(this));
    let elementType = argsTypes.reduce((prev, next) => {
      assertType(prev, next);
      return prev;
    });
    return new ArrayType(elementType);
  }

For something like our array length primitive, we need to assert that the parameter type is an array, but we don’t care about the element type. So, instead of using assertType we manually check that the type is an instance of ArrayType, and raise a TypeError otherwise. The inferred type of such expression is number.

  visitLength(node: Length) {
    let type = node.array.visit(this);
    if (type instanceof ArrayType) {
      return new NumberType();
    } else {
      throw TypeError(`Expected an array, got: ${type}`);
    }
  }

When handling array lookup, we assert that the index is a number, and that the array is of type array:

  visitArrayLookup(node: ArrayLookup): Type {
    assertType(new NumberType(), node.index.visit(this));
    let type = node.array.visit(this);
    if (type instanceof ArrayType) {
      return type.element;
    } else {
      throw TypeError(`Expected an array, got: ${type}`);
    }
  }

Functions

When encountering a function, we add its signature to the functions environment. We do not need to infer it, because it is parsed from the source. Before type-checking the body of the function, we create a new visitor with a modified environment. Since we use mutable maps, we need to pass a new Map to each function, to avoid modifying the wrong function’s environment. We also set the currentFunctionReturnType to the one in the signature.

  visitFunction(node: Function) {
    this.functions.set(node.name, node.signature);
    let visitor = new TypeChecker(
      new Map(node.signature.parameters),
      this.functions,
      node.signature.returnType,
    );
    node.body.visit(visitor);
    return new VoidType();
  }

When visiting a Call, we fetch that function’s signature from the functions environment. Then we infer the type of the function being called and compare it to the type from the environment. When inferring the type of the function we visit each argument, and since FunctionType requires a name for each parameter, we assign them dummy names, such as x0, x1, x2, etc. When it comes to the function’s return type, we use the one from the type annotation.

  visitCall(node: Call) {
    let expected = this.functions.get(node.callee);
    if (!expected) {
      throw TypeError(`Function ${node.callee} undefined`);
    }
    let argsTypes = new Map();
    node.args.forEach((arg, i) =>
      argsTypes.set(`x${i}`, arg.visit(this))
    );
    let got =
      new FunctionType(argsTypes, expected.returnType);
    assertType(expected, got);
    return expected.returnType;
  }

When checking a return statement, we infer the type of the returned value and compare it with the one from the function annotation, which we saved conveniently in currentFunctionReturnType instance variable.

  visitReturn(node: Return) {
    let type = node.term.visit(this);
    if (this.currentFunctionReturnType) {
      assertType(this.currentFunctionReturnType, type);
      return new VoidType();
    } else {
      throw TypeError(
        "Return statement outside of any function");
    }
  }

If and While

When checking If and While we visit each inner node to make sure that their type is checked, but we don’t enforce any type. The conditional could be checked to be boolean, but in TypeScript (as in many languages) it is not required to be such. We could also enforce that the statements inside the branches return void, but this is not usually required either.

  visitIf(node: If) {
    node.conditional.visit(this);
    node.consequence.visit(this);
    node.alternative.visit(this);
    return new VoidType();
  }
  
  visitWhile(node: While) {
    node.conditional.visit(this);
    node.body.visit(this);
    return new VoidType();
  }

However, if we had a ternary conditional operation, we would need to ensure that the two branches return the same type.

Soundness

Our type checker is complete. But is it sound? A type system is sound if the types at compile time are always consistent with run time. Can you find a soundness issue in our type checker?

Although we check that each return statement is consistent with the annotated return type, we don’t check that each control-flow path has a return statement. Here’s an example that demonstraits this issue:

function wrongReturnType(x: boolean): Array<number> {
  if (x) {
    return [42];
  }
}

function main() {
  var a = wrontReturnType(false);
  a[0]; // Segmentation fault
}

Checking that each control flow path leads to a return statement requires a bit of control-flow analysis which is outside of the scope of this book.

Error messages

We can improve our error messages by specializing them to each situation instead of relying on a generic message produced by assertType.