Automatic Compiler Pass Fusion

2019-09-22

On the one hand, when we write a compiler we want to split it into many compiler passes to make each pass more tractable and testable. On the other hand, we want to minimize the number of AST traversals to improve performance. Automatic pass fusion is about combining several passes into one traversal to address both concerns. This article presents one way to do that.

A previous post, map as a recursion scheme, showed how to use the map function to reduce boilerplate when writing compiler passes. In this post, we build upon that work, including the toy language and the AST type, that we refer here as t.

The language is as follows:

(e)
| ()
| true | false
| 0 | 1 | 2 |
| id
| e / e
| e; e
| let id = e in e
| if e then e else e

Here is the type we use to represent its syntax:

type t =
  | Unit
  | Boolean of bool
  | Number of int
  | Id of string
  | Divide of t * t
  | Sequence of t * t
  | Let of {id: string; value: t; body: t}
  | If of {conditional: t; consequence: t; alternative: t}

Overview

We’ll take three language transformations to create motivation for our passes. Then we’ll implement them in three different ways:

As we go, we’ll compare the code, and in the end, we’ll compare the performance too.

Transformations

If you’ve read this blog before, then you are familiar with these transformations. We’ve used them before.

Our first transformation is dead code elimination. It removes redundant branches of the if statement, in case the condition is hard-coded:

if true  then x else y  ⇒  x
if false then x else y  ⇒  y

Next one is constant propagation where we compute statically known fractions, for example:

42 / 2  ⇒  21

Next is removing redundant let, in case a let binding is used immediately in its own body:

let x = y in x  ⇒  y

No Fusion

First, let’s implement the three passes separately, using the map function as the recursion scheme.

module Not_fused = struct
  module Dead_code_elimination = struct
    let rec pass = function
      | If {conditional=Boolean true; consequence; _} ->
          pass consequence
      | If {conditional=Boolean false; alternative; _} ->
          pass alternative
      | other ->
          map pass other
  end

  module Constant_propagation = struct
    let rec pass = function
      | Divide (Number n, Number m) when m <> 0 ->
          pass (Number (n / m))
      | other ->
          map pass other
  end

  module Remove_redundant_let = struct
    let rec pass = function
      | Let {name; value; body=Name n} when n = name ->
          pass value
      | other ->
          map pass other
  end

  let pass t =
    Remove_redundant_let.pass
      (Constant_propagation.pass
        (Dead_code_elimination.pass t))
end


In the end, we combined the passes using function composition.

Manual Fusion

Now, let’s try to combine the three passes into a single traversal manually.

module Manually_fused = struct
  let rec pass = function
    | If {conditional=Boolean true; consequence; _} ->
        pass consequence
    | If {conditional=Boolean false; alternative; _} ->
        pass alternative
    | Divide (Number n, Number m) when m <> 0 ->
        pass (Number (n / m))
    | Let {name; value; body=Name n} when n = name ->
        pass value
    | other ->
        map pass other
end
Note that Not_fused.pass and Manually_fused.pass are not the same function. Since Not_fused.pass works in three separate passes, an earlier pass might uncover more optimization opportunities for the next passes. For example, consider a syntax tree corresponding to a fragment of our toy language:
10 / (if true then 2 else 5)

The Not_fused.pass will first run Dead_code_elimination.pass to obtain 10 / 2, and then separately run Constant_propagation.pass to get the result—5. At the same time, Manually_fused.pass dives straight into dead code elimination, without reconsiderations and ends up with 10 / 2 as the result.

However, both functions are not perfect and do not uncover all the optimization opportunities. For example, consider a syntax tree for the following fragment:

10 / (let x = 2 in x)

Both Not_fused.pass and Manually_fused.pass evaluate this to 10 / 2. So both miss some optimizations, and the only way to cover them all is to call the function repeatedly until a fixpoint value emerges.

Automatic Fusion

Here’s an implementation style that allows writing the three passes separately, and then fuse them into a single traversal.

module Fused = struct
  module Dead_code_elimination = struct
    let pass first_pass next_pass = function
      | If {conditional=Boolean true; consequence; _} ->
          first_pass consequence
      | If {conditional=Boolean false; alternative; _} ->
          first_pass alternative
      | other ->
          next_pass other
  end

  module Constant_propagation = struct
    let pass first_pass next_pass = function
      | Divide (Number n, Number m) when m <> 0 ->
          first_pass (Number (n / m))
      | other ->
          next_pass other
  end

  module Remove_redundant_let = struct
    let pass first_pass next_pass = function
      | Let {name; value; body=Name n} when n = name ->
          first_pass value
      | other ->
          next_pass other
  end

  (* Fuse the passes together *)
  let rec pass t =
    (Dead_code_elimination.pass pass
      (Constant_propagation.pass pass
        (Remove_redundant_let.pass pass
          (map pass)))) t
end


Here, each sub-pass, instead of being directly recursive, relies on open recursion. Each takes two parameters: first_pass and next_pass. If none of the patterns in the pass match, then it delegates the work to the next_pass. However, if the pass needs to recur on a nested expression, it calls first_pass, to re-start the pipeline.

The sub-passes are combined as follows. Each consecutive pass is the next one’s next_pass argument, ending with map pass which ties the first recursive knot. The combined pass is also recursively passed to each function as the first_pass parameter, tying the second recursive knot.

Fixpoint combinator and function composition operator could be used here, but they affect the performance of the resulting pass (probably due to closure allocation after partial application).

Proof

The resulting function Fused.pass is identical to Manually_fused.pass for all inputs. We can use Imandra proof assistant to check this:

#use "fusion.ml"

theorem our_fusion_is_the_same_as_manual_fusion x =
  Fused.pass x = Manually_fused.pass x
[@@auto]

When running this, Imandra will print several pages of proofs, and in the end will conclude:

[✓] Theorem proved.

Benchmark

After generating 100_000_000 random syntax trees and measuring the total time of applying the three different techniques, we get the results in seconds:


Conclusion

We can see that our pass fusion technique achieves most of the performance benefit of manual pass fusion, while maintaining independent implementation of each sub-pass.

Having independent sub-passes allows a hypothetical pass manager to arrange them differently, depending on the optimization level. For a fast unoptimized build, it can fuse as many passes as possible into one pipeline. For slower optimized builds it can instantiate each sub-pass into a full-fledged pass and even run some of them repeatedly.

Another benefit is that each sub-pass can be tested in isolation.

Code

The gist contains the code from this article together with additional testing and benchmarking code.

Futher work

All passes that we discussed have the form t -> t. It would be interesting to see how this technique fairs for passes of form t -> t m for some monad m, or for passes from one AST to a different one.

The passes we discussed used disjoint patterns. It would be interesting to see if this could be adapted for passes with overlapping patterns.

Follow me on Twitter