On the one hand, when we write a compiler we want to split it into many compiler passes to make each pass more tractable and testable. On the other hand, we want to minimize the number of AST traversals to improve performance. Automatic pass fusion is about combining several passes into one traversal to address both concerns. This article presents one way to do that.
A previous post, map as a recursion scheme, showed how to use the map function to reduce boilerplate when writing compiler passes. In this post, we build upon that work, including the toy language and the AST type, that we refer here as t
.
The language is as follows:
e → (
e)
| ()
| true
| false
| 0
| 1
| 2
|
…
|
id
|
e /
e
|
e;
e
| let
id =
e in
e
| if
e then
e else
e
Here is the type we use to represent its syntax:
type t =
| Unitof bool
| Boolean of int
| Number of string
| Id of t * t
| Divide of t * t
| Sequence of {id: string; value: t; body: t}
| Let of {conditional: t; consequence: t; alternative: t} | If
We’ll take three language transformations to create motivation for our passes. Then we’ll implement them in three different ways:
As we go, we’ll compare the code, and in the end, we’ll compare the performance too.
Our first transformation is dead code elimination. It removes redundant branches of the if statement, in case the condition is hard-coded:
if true then x else y ⤇ x
if false then x else y ⤇ y
Next one is constant propagation where we compute statically known fractions, for example:
42 / 2 ⤇ 21
Next is removing redundant let, in case a let binding is used immediately in its own body:
let x = y in x ⤇ y
First, let’s implement the three passes separately, using the map function as the recursion scheme.
module Not_fused = struct
module Dead_code_elimination = struct
let rec pass = function
true; consequence; _} ->
| If {conditional=Boolean
pass consequencefalse; alternative; _} ->
| If {conditional=Boolean
pass alternative
| other ->
map pass otherend
module Constant_propagation = struct
let rec pass = function
when m <> 0 ->
| Divide (Number n, Number m)
pass (Number (n / m))
| other ->
map pass otherend
module Remove_redundant_let = struct
let rec pass = function
when n = name ->
| Let {name; value; body=Name n}
pass value
| other ->
map pass otherend
let pass t =
Remove_redundant_let.pass
(Constant_propagation.pass
(Dead_code_elimination.pass t))end
In the end, we combined the passes using function composition.
Now, let’s try to combine the three passes into a single traversal manually.
module Manually_fused = struct
let rec pass = function
true; consequence; _} ->
| If {conditional=Boolean
pass consequencefalse; alternative; _} ->
| If {conditional=Boolean
pass alternativewhen m <> 0 ->
| Divide (Number n, Number m)
pass (Number (n / m))when n = name ->
| Let {name; value; body=Name n}
pass value
| other ->
map pass otherend
Note that Not_fused.pass
and Manually_fused.pass
are not the same function. Since Not_fused.pass
works in three separate passes, an earlier pass might uncover more optimization opportunities for the next passes. For example, consider a syntax tree corresponding to a fragment of our toy language:
10 / (if true then 2 else 5)
The Not_fused.pass
will first run Dead_code_elimination.pass
to obtain 10 / 2
, and then separately run Constant_propagation.pass
to get the result—5
. At the same time, Manually_fused.pass
dives straight into dead code elimination, without reconsiderations and ends up with 10 / 2
as the result.
However, both functions are not perfect and do not uncover all the optimization opportunities. For example, consider a syntax tree for the following fragment:
10 / (let x = 2 in x)
Both Not_fused.pass
and Manually_fused.pass
evaluate this to 10 / 2
. So both miss some optimizations, and the only way to cover them all is to call the function repeatedly until a fixpoint value emerges.
Here’s an implementation style that allows writing the three passes separately, and then fuse them into a single traversal.
module Fused = struct
module Dead_code_elimination = struct
let pass first_pass next_pass = function
true; consequence; _} ->
| If {conditional=Boolean
first_pass consequencefalse; alternative; _} ->
| If {conditional=Boolean
first_pass alternative
| other ->
next_pass otherend
module Constant_propagation = struct
let pass first_pass next_pass = function
when m <> 0 ->
| Divide (Number n, Number m)
first_pass (Number (n / m))
| other ->
next_pass otherend
module Remove_redundant_let = struct
let pass first_pass next_pass = function
when n = name ->
| Let {name; value; body=Name n}
first_pass value
| other ->
next_pass otherend
(* Fuse the passes together *)
let rec pass t =
(Dead_code_elimination.pass pass
(Constant_propagation.pass pass
(Remove_redundant_let.pass pass
(map pass)))) tend
Here, each sub-pass, instead of being directly recursive, relies on open recursion. Each takes two parameters: first_pass
and next_pass
. If none of the patterns in the pass match, then it delegates the work to the next_pass
. However, if the pass needs to recur on a nested expression, it calls first_pass
, to re-start the pipeline.
The sub-passes are combined as follows. Each consecutive pass is the next one’s next_pass
argument, ending with map pass
which ties the first recursive knot. The combined pass is also recursively passed to each function as the first_pass
parameter, tying the second recursive knot.
Fixpoint combinator and function composition operator could be used here, but they affect the performance of the resulting pass (probably due to closure allocation after partial application).
The resulting function Fused.pass
is identical to Manually_fused.pass
for all inputs. We can use Imandra proof assistant to check this:
#use "fusion.ml"
theorem our_fusion_is_the_same_as_manual_fusion x =
Fused.pass x = Manually_fused.pass x
[@@auto]
When running this, Imandra will print several pages of proofs, and in the end will conclude:
[✓] Theorem proved.
After generating 100_000_000
random syntax trees and measuring the total time of applying the three different techniques, we get the results in seconds:
We can see that our pass fusion technique achieves most of the performance benefit of manual pass fusion, while maintaining independent implementation of each sub-pass.
Having independent sub-passes allows a hypothetical pass manager to arrange them differently, depending on the optimization level. For a fast unoptimized build, it can fuse as many passes as possible into one pipeline. For slower optimized builds it can instantiate each sub-pass into a full-fledged pass and even run some of them repeatedly.
Another benefit is that each sub-pass can be tested in isolation.
The gist contains the code from this article together with additional testing and benchmarking code.
All passes that we discussed have the form t -> t
. It would be interesting to see how this technique fairs for passes of form t -> t m
for some monad m
, or for passes from one AST to a different one.
The passes we discussed used disjoint patterns. It would be interesting to see if this could be adapted for passes with overlapping patterns. ■