On the one hand, when we write a compiler we want to split it into many compiler passes to make each pass more tractable and testable. On the other hand, we want to minimize the number of AST traversals to improve performance. Automatic pass fusion is about combining several passes into one traversal to address both concerns. This article presents one way to do that.
A previous post,
map as a recursion scheme,
showed how to use the map function
to reduce boilerplate when writing compiler passes. In this post, we build upon
that work, including the toy language and the AST type, that we refer here as
t
.
The language is as follows:
e → (
e)
| ()
| true
| false
| 0
| 1
| 2
|
…
|
id
|
e /
e
|
e;
e
| let
id =
e in
e
| if
e then
e else
e
Here is the type we use to represent its syntax:
type t = | Unit | Boolean of bool | Number of int | Id of string | Divide of t * t | Sequence of t * t | Let of {id: string; value: t; body: t} | If of {conditional: t; consequence: t; alternative: t}
We’ll take three language transformations to create motivation for our passes. Then we’ll implement them in three different ways:
As we go, we’ll compare the code, and in the end, we’ll compare the performance too.
If you’ve read this blog before, then you are familiar with these transformations. We’ve used them before.
Our first transformation is dead code elimination. It removes redundant branches of the if statement, in case the condition is hard-coded:
if true then x else y ⇒ x if false then x else y ⇒ y
Next one is constant propagation where we compute statically known fractions, for example:
42 / 2 ⇒ 21
Next is removing redundant let, in case a let binding is used immediately in its own body:
let x = y in x ⇒ y
First, let’s implement the three passes separately, using the map function as the recursion scheme.
module Not_fused = struct module Dead_code_elimination = struct let rec pass = function | If {conditional=Boolean true; consequence; _} -> pass consequence | If {conditional=Boolean false; alternative; _} -> pass alternative | other -> map pass other end module Constant_propagation = struct let rec pass = function | Divide (Number n, Number m) when m <> 0 -> pass (Number (n / m)) | other -> map pass other end module Remove_redundant_let = struct let rec pass = function | Let {name; value; body=Name n} when n = name -> pass value | other -> map pass other end let pass t = Remove_redundant_let.pass (Constant_propagation.pass (Dead_code_elimination.pass t)) end
In the end, we combined the passes using function composition.
Now, let’s try to combine the three passes into a single traversal manually.
module Manually_fused = struct let rec pass = function | If {conditional=Boolean true; consequence; _} -> pass consequence | If {conditional=Boolean false; alternative; _} -> pass alternative | Divide (Number n, Number m) when m <> 0 -> pass (Number (n / m)) | Let {name; value; body=Name n} when n = name -> pass value | other -> map pass other endNote that
Not_fused.pass
and Manually_fused.pass
are not the same
function. Since Not_fused.pass
works in three separate passes, an earlier
pass might uncover more optimization opportunities for the next passes.
For example, consider a syntax tree corresponding to a fragment of our
toy language:
10 / (if true then 2 else 5)
The Not_fused.pass
will first run Dead_code_elimination.pass
to obtain
10 / 2
, and then separately run Constant_propagation.pass
to get the
result—5
. At the same time, Manually_fused.pass
dives straight into
dead code elimination, without reconsiderations and ends up with 10 / 2
as the result.
However, both functions are not perfect and do not uncover all the optimization opportunities. For example, consider a syntax tree for the following fragment:
10 / (let x = 2 in x)
Both Not_fused.pass
and Manually_fused.pass
evaluate this to 10 / 2
.
So both miss some optimizations, and the only way to cover them all is to
call the function repeatedly until a fixpoint value emerges.
Here’s an implementation style that allows writing the three passes separately, and then fuse them into a single traversal.
module Fused = struct module Dead_code_elimination = struct let pass first_pass next_pass = function | If {conditional=Boolean true; consequence; _} -> first_pass consequence | If {conditional=Boolean false; alternative; _} -> first_pass alternative | other -> next_pass other end module Constant_propagation = struct let pass first_pass next_pass = function | Divide (Number n, Number m) when m <> 0 -> first_pass (Number (n / m)) | other -> next_pass other end module Remove_redundant_let = struct let pass first_pass next_pass = function | Let {name; value; body=Name n} when n = name -> first_pass value | other -> next_pass other end (* Fuse the passes together *) let rec pass t = (Dead_code_elimination.pass pass (Constant_propagation.pass pass (Remove_redundant_let.pass pass (map pass)))) t end
Here, each sub-pass, instead of being directly recursive, relies on open
recursion. Each takes two parameters: first_pass
and next_pass
. If none of
the patterns in the pass match, then it delegates the work to the next_pass
.
However, if the pass needs to recur on a nested expression, it calls
first_pass
, to re-start the pipeline.
The sub-passes are combined as follows. Each consecutive pass is the next one’s
next_pass
argument, ending with map pass
which ties the first recursive knot.
The combined pass is also recursively passed to each function as the first_pass
parameter, tying the second recursive knot.
Fixpoint combinator and function composition operator could be used here, but they affect the performance of the resulting pass (probably due to closure allocation after partial application).
The resulting function Fused.pass
is identical to Manually_fused.pass
for all inputs. We can use Imandra
proof assistant to check this:
#use "fusion.ml" theorem our_fusion_is_the_same_as_manual_fusion x = Fused.pass x = Manually_fused.pass x [@@auto]
When running this, Imandra will print several pages of proofs, and in the end will conclude:
[✓] Theorem proved.
After generating 100_000_000
random syntax trees and measuring
the total time of applying the three different techniques, we get the
results in seconds:
We can see that our pass fusion technique achieves most of the performance benefit of manual pass fusion, while maintaining independent implementation of each sub-pass.
Having independent sub-passes allows a hypothetical pass manager to arrange them differently, depending on the optimization level. For a fast unoptimized build, it can fuse as many passes as possible into one pipeline. For slower optimized builds it can instantiate each sub-pass into a full-fledged pass and even run some of them repeatedly.
Another benefit is that each sub-pass can be tested in isolation.
The gist contains the code from this article together with additional testing and benchmarking code.
All passes that we discussed have the form t -> t
.
It would be interesting to see how this technique fairs
for passes of form t -> t m
for some monad m
, or for passes
from one AST to a different one.
The passes we discussed used disjoint patterns. It would be interesting to see if this could be adapted for passes with overlapping patterns. ■
⁂
Follow me on Twitter