Compiling to Assembly
from Scratch

Compiling to Assembly from Scratch — Vladimir Keleshev

— the book —

ARM — TypeScript — Summer 2020

So, you’ve been trying to learn how compilers and programming languages work?

Perhaps, you’ve learned about compiling to JavaScript, or about building an interpreter? Or, maybe, about compiling to bytecode? All good steps.

But there’s a tension building up.

Because it feels a bit like cheating. Because you know that somewhere, somehow, the code you write is translated to assembly instructions. To the machine language. That’s where the rubber hits the road. That’s where it gets hot. And, oh-so-many resources are hesitant to cover this part. But not this book.

This small book will show you in detail how you can build a compiler from scratch that goes all the way from source to assembly.

The example code is written in TypeScript, a dialect of JavaScript. The book describes the design and implementation of a compiler that emits 32-bit ARM assembly instructions.

Why ARM?

In many ways, the ARM instruction set is what makes this book possible.

Compared to Intel x86-64, the ARM instruction set is a work of art.

Intel x86-64 is the result of evolution from an 8-bit processor, to a 16-bit one, then to a 32-bit one, and finally to a 64-bit one. At each step of the evolution, it accumulated complexity and cruft. At each step, it tried to satisfy conflicting requirements.

Guess which one is an easier target for a compiler?

If this book targeted Intel x86-64 instead of ARM, it would have been two times as long and — more likely — never written. Also, with 160 billion devices shipped, we better get used to the fact that ARM is the dominant instruction set architecture today.

In other words… ARM is a good start. After learning it, you will be better equipped for moving to x86-64 or the new ARM64.

Will you be able to run the code your compiler produces?

I bet you will! The Appendix will contain a bazillion ways to execute ARM code, starting from Raspberry Pi, cloud VM, to various ways to emulate ARM on Linux, Windows, and macOS.

Why TypeScript?

First of all, you will be able to follow this book in any reasonable programming language. For me, it was tough to pick one for this job, and I’m pleased I’ve chosen TypeScript.

TypeScript is probably nobody’s favorite, but it’s a good compromise:

Don’t worry if you’ve never seen TypeScript code before. If you can read the following, you will most likely be able to pick it up, as the book goes (real code from the book here!):

class Label {
  static counter = 0;
  value: number; // Type annotation

  constructor() {
    this.value = Label.counter++;
  }

  toString() {
    return '.L' + this.value;
  }
}

I avoided using any TypeScript- or JavaScript-specific language features in the code.

If you’re into statically-typed functional programming languages (Haskell, OCaml, or Reason ML), you will find that the class structure I used has a nice translation to an algebraic data type. It is, in fact, how I wrote it first.

The Contents

The book consists of two parts. Part I presents a detailed, step-by-step guide on how to develop a small “baseline” compiler that can compile simple programs to ARM assembly.

By the end of Part I, you will have a working compiler that can compile simple functions like this one:

function factorial(n) {
  if (n == 0) {
    return 1;
  } else {
    return n * factorial(n - 1);
  }
}

Into ARM assembly code like this:

.global factorial
factorial:
  push {fp, lr}
  mov fp, sp
  push {r0, r1}
  ldr r0, =0
  push {r0, ip}
  ldr r0, [fp, #-8]
  pop {r1, ip}
  cmp r0, r1
  moveq r0, #1
  movne r0, #0
  cmp r0, #0
  beq .L1
  ldr r0, =1
  b .L2
.L1:
  ldr r0, =1
  mov r1, r0
  ldr r0, [fp, #-8]
  sub r0, r0, r1
  bl factorial
  mov r1, r0
  ldr r0, [fp, #-8]
  mul r0, r0, r1
.L2:
  mov sp, fp
  pop {fp, pc}

This code won’t win any awards, and an optimizing compiler could do much better, but it’s a start!

Part II talks about more advanced topics in less details. It explores several different (often mutually exclusive) directions in which you can take your compiler.

Draft Table of Content

Introduction

Part I

Part II

Appendix: Running ARM code

About me

My name is Vladimir Keleshev, I have worked with compilers both commercially and in open-source. My fondness of ARM assembly stems from my previous work in embedded systems. Currently, I work in finance with domain-specific languages. I’m @keleshev on Twitter.



Illustrations by @PbKatiuska