Chapter 9
Introduction to Part II

Compiling to Assembly from Scratch
by Vladimir Keleshev

Before extending the compiler, let’s discuss the language we have implemented so far.

Is our language memory-safe? What is memory safety, anyway? Simply speaking, a language is memory-safe if it does not allow you to write a program that causes a segmentation fault. The baseline language is memory-safe if we limit ourselves to calling functions that we have defined ourselves. However, our calling convention allows us to call arbitrary libc functions. You can find creative ways to call these functions that will lead to a segmentation fault (try free(42)). So, unless we do something about that, the baseline language is not memory-safe.

A way to fix that is to introduce a prefix for function labels. For example, a function factorial can be compiled with a label ts$factorial:, and a call to factorial can be compiled to a jump to ts$factorial. This way, you can only call functions that are defined in the source language, or that had explicit wrappers written in assembly. These wrappers can be auto-generated by the compiler and also handle type conversion, if necessary.

Is our language dynamically-typed? Or is it statically-typed? Both and neither! The baseline language supports only integer numbers. So, it could be thought of as a dynamically-typed language with only one data type, or as a statically-typed language with one static type. But we are soon to change this.

However, before we explore static and dynamic typing, we need to have more than one data type in our language. We will start by introducing booleans, undefined, and then arrays. First, we will introduce them in an unsafe/untyped manner, and then we will apply static/dynamic treatment to them.

Next: Chapter 10. Primitive Scalar Data Types