Compiling to Assembly
from Scratch

Buy ebook • $45

So, you’ve been trying to learn how compilers and programming languages work?

Perhaps, you’ve learned about compiling to JavaScript, or about building an interpreter? Or, maybe, about compiling to bytecode? All good steps.

But there’s a tension building up.

Because it feels a bit like cheating. Because you know that somewhere, somehow, the code you write is translated to assembly instructions. To the machine language. That’s where the rubber hits the road. That’s where it gets hot. And, oh-so-many resources are hesitant to cover this part. But not this book.

This ebook will show you in detail how you can build a compiler from scratch that goes all the way from source to assembly.

The example code is written in a subset of TypeScript that reads like pseudocode. The book describes the design and implementation of a compiler that emits 32-bit ARM assembly instructions. All you need to know is how to program, the book will teach you enough compiler theory and assembly programming to get going.

Buy now and get the following:


Buy ebook • $45
Excl. EU VAT







Why ARM?

In many ways, the ARM instruction set is what makes this book possible.

Compared to Intel x86-64, the ARM instruction set is a work of art.

Intel x86-64 is the result of evolution from an 8-bit processor, to a 16-bit one, then to a 32-bit one, and finally to a 64-bit one. At each step of the evolution, it accumulated complexity and cruft. At each step, it tried to satisfy conflicting requirements.

Guess which one is an easier target for a compiler?

If this book targeted Intel x86-64 instead of ARM, it would have been two times as long and — more likely — never written. Also, with 160 billion devices shipped, we better get used to the fact that ARM is the dominant instruction set architecture today.

In other words… ARM is a good start. After learning it, you will be better equipped for moving to x86-64 or the new ARM64.

Will you be able to run the code your compiler produces?

I bet you will! The Appendix contains numerous ways to execute ARM code, starting from Raspberry Pi, cloud VM, to emulating ARM on Linux and Windows.

Why TypeScript?

First of all, you will be able to follow this book in any reasonable programming language. For me, it was tough to pick one for this job, and I’m pleased I’ve chosen TypeScript.

TypeScript is probably nobody’s favorite, but it’s a good compromise:

Don’t worry if you’ve never seen TypeScript code before. If you can read the following, you will most likely be able to pick it up, as the book goes (real code from the book here!):

class Label {
  static counter = 0;
  value: number; // Type annotation

  constructor() {
    this.value = Label.counter++;
  }

  toString() {
    return '.L' + this.value;
  }
}

I avoided using any TypeScript- or JavaScript-specific language features in the code.

If you’re into statically-typed functional programming languages (Haskell, OCaml, or Reason ML), you will find that the class structure I used has a nice translation to an algebraic data type. It is, in fact, how I wrote it first.

Book Contents

The book consists of two parts. Part I presents a detailed, step-by-step guide on how to develop a small “baseline” compiler that can compile simple programs to ARM assembly.

By the end of Part I, you will have a working compiler that can compile simple functions like this one:

function factorial(n) {
  if (n == 0) {
    return 1;
  } else {
    return n * factorial(n - 1);
  }
}

Into ARM assembly code like this:

.global factorial
factorial:
  push {fp, lr}
  mov fp, sp
  push {r0, r1}
  ldr r0, =0
  push {r0, ip}
  ldr r0, [fp, #-8]
  pop {r1, ip}
  cmp r0, r1
  moveq r0, #1
  movne r0, #0
  cmp r0, #0
  beq .L1
  ldr r0, =1
  b .L2
.L1:
  ldr r0, =1
  mov r1, r0
  ldr r0, [fp, #-8]
  sub r0, r0, r1
  bl factorial
  mov r1, r0
  ldr r0, [fp, #-8]
  mul r0, r0, r1
.L2:
  mov sp, fp
  pop {fp, pc}

This code won’t win any awards, and an optimizing compiler could do much better, but it’s a start!

Part II talks about more advanced topics in less details. It explores several different (often mutually exclusive) directions in which you can take your compiler.



Read excerpt

At the moment, I’m working on bringing the print edition of the book to reality. However, there’s no timeline for it yet. You know, print is a bit harder than digital. I need to be 100% confident about each word in the book before it is sent to the press. However, I know that nobody likes buying the same book twice. So, if you buy the ebook today, I make a promise to give you a discount, once the print edition is out.

Can’t afford it?

If due to circumstances you can’t afford to buy the book—write me an email at vladimir@keleshev.com, and I’m sure we can figure something out!

About me

My name is Vladimir Keleshev, I have worked with compilers both commercially and in open-source. My fondness of ARM assembly stems from my previous work in embedded systems. Currently, I work in finance with domain-specific languages.

Questions?

Contact me at vladimir@keleshev.com or on Twitter at @keleshev.



Illustrations by @PbKatiuska