Implementing a Functional Language II: Graph Reduction
This is part 2 of a series in implementing a functional language. The introduction is here.
Every compilation strategy we will cover in this series will be based on the same core concept: lazy graph reduction. This idea is simple but powerful, and it’s what we’ll introduce in this section.
A functional program is really just a single large expression which evaluates to a result.
We can model it as a graph1 of nested applications. For example, the expression f 1 2
looks like this:
We use the symbol @
to denote an application node. Note that though f
takes two
arguments, we apply each in turn via currying. Here’s a more complicated example - the
graph of (+ (* 2 3) 4)
:
To evaluate the program, we’ll repeatedly reduce expressions in the graph. We’ll stop when there are no expressions left to reduce. The resulting graph is the result of the program. Reducing an expression typically involves applying a function to one or more arguments, producing a result, and then replacing the expression with the result.
Not all expressions can be reduced: + 1 2
reduces to 3
but 3
does not reduce any
further. Similarly, + 1
does not reduce because +
requires two arguments. An
expression which is reducible is known as a reducible expression, or redex. To evaluate an
entire program, all we need to do is repeatedly identify the next redex to reduce, and
then reduce it, stopping when there are no redexes left.
Let’s try to reduce the graph above. We start at the top, with an application node @
.
The child nodes are @
(another application) and 4
. We can’t immediately evaluate this
expression, so we proceed down into the child application node, which itself has children
+
and @
. +
takes two arguments, in this case (* 2 3)
and 4
. +
also requires
that both its arguments are evaluated, so we need to continue to reduce the first
argument. We descend into the right hand application node - the root of (* 2 3)
. Here’s
where we are in the graph:
As before, we descend into the child application node, finally reaching *
. *
takes two
numeric arguments, and we have two arguments in 2
and 3
. We’ve found a reducible
expression! Specifically, the subgraph
can be reduced to the single numeric node 6
. The node marked in red is the root of
the redex. Performing the reduction, the graph now looks like this:
Notice that the expression (* 2 3)
has been replaced with the result of its reduction
(6
). Proceeding back up graph, we can see that the arguments to +
are now both
evaluated, so we’ve found another redex. Reducing this, we end up with a single node 10
,
which is the result of the whole expression.
Reduction order
The order in which we reduce expressions in a program has a profound effect on the behaviour of the program. The approach we used in reducing the program above is called normal order reduction. Normal order reduction requires that we reduce the leftmost outermost redex first.
One consequence of this is that we reduce applications before reducing their arguments. This is in contrast to a typical imperative language, where arguments to functions are evaluated before the function is called. Take for example the following Core program:
K x y = x
main = K 1 (/ 1 0)
where we can assume that (/ 1 0)
will raise an error if evaluated. Following normal
order reduction, we will reduce the application of K
first, resulting in the following:
main = 1
Since the second argument to K
is never used, we never evaluate it. In a typical
imperative language, we would evaluate (/ 1 0)
first, resulting in an error. This style
of execution is called lazy evaluation, in contrast to strict evaluation. The two key
properties of lazy evaluation are the following:
- Arguments to functions are only evaluated when needed.
- Each argument is only evaluated once - further uses of the argument will use the result from the initial evaluation.
The second property doesn’t affect program behaviour but greatly improves efficiency.
Normal form and Weak head normal form
Normal order reduction specifies that we reduce the leftmost outermost redex first,
but it doesn’t specify when to stop. A natural assumption is to stop when there are no
redexes left, but this is not our only option. If the output of our program is being
printed to the screen, and we want to show progress as we go (or we’re printing an
infinite stream of values) then we want to be able to produce output without having fulling
evaluated it yet. Imagine a list made from a series of Cons
cells linked together: we
may want to print the first element before having evaluated the whole list.
To do this, we need to stop reducing when there is no longer a top-level redex. This will allow us to inspect the structure and decide what to evaluate next. An expression which has no top-level redexes (but may have inner redexes left) is in weak head normal form (WHNF). An expression which has no redexes at all is in normal form (NF). All expressions in NF are also in WHNF, but not vice versa. Here are some examples.
Normal Form | Weak Head Normal Form |
---|---|
3 |
3 |
(+ 1) |
(+ 1) |
(+ (+ 2 3)) |
For Core we’ll follow lazy evaluation and evaluate expressions to WHNF. For some built in
functions (like +
) we will adopt strict semantics, requiring arguments to be fully
evaluated to normal form. For all user-defined functions we’ll use lazy semantics.
Reducing programs
Let’s walk through the evaluation of the following program:
square x = * x x ;
main = square (square 3)
To start with, the graph of our program consists of just the node main
. For reference
we’ve also drawn the graph of the supercombinator square
, though we don’t evaluate this
until it appears in main
.
main
is a
supercombinator with no arguments, so it is a redex. We reduce it by replacing it with its
body, yielding the following graph:
The outermost redex is the top node - the outer application of square
. To reduce it, we
replace the redex with an instance of the supercombinator body, substituting any
parameters with a pointer to the argument of the application. This gives us:
We see that the inner redex (square 3)
is now shared between two application nodes.
Notice that this transformation has resulted in a true graph rather than a tree. The
subgraph coloured blue is the instantiated body of square
.
The application of *
cannot be reduced because *
is a strict primitive and requires
both its arguments to be evaluated first. Hence the only redex is the inner (square 3)
.
This yields:
The subgraph coloured blue is the second instantiation of square
. The only redex is now
the inner multiplication, so we reduce that.
And now we can reduce the outer multiplication, which directly yields 81
.
So the general steps are:
- Find the next redex
- Reduce it
- Update the root of the redex with the result
Unwinding the spine
To find the next redex, we need to find the leftmost outermost function application. To do that we follow these steps:
- From the root of the graph, follow the left branch until you reach a supercombinator or primitive.
- Check how many arguments the supercombinator or primitive takes and go back up the graph that number of times - you’re now at the root of the outermost function application.
The chain of left-branching application nodes is called the spine, and the process of traversing it like this is called known as unwinding the spine. The function arguments are typically stored on a stack to make them easy to access.
If the function is a supercombinator, then we’ve found a redex. If it’s a primitive, then we may first need to reduce the arguments to the function before the application becomes reducible.
If we need to reduce an argument, we must put the current stack to one side and begin unwinding again from the root of the argument. We may need to repeat this if the argument contains further unevaluated function applications. To keep track of these stacks we use a stack of stacks, known as a dump. When we need to evaluate an argument we push the current stack on to the dump, and when we’ve finished evaluating it we pop the old stack off the dump.
Reducing let(rec)
expressions
A supercombinator is reduced by substituting arguments into its body. If there are
let(rec)
expressions in the body, they are represented as paths in the graph. For
example:
let y = 3
in + y y
is represented as
The let
expression defines a sub-expression 3
, which is named y
. The body of the
let
expression references y
via pointers to 3
. In this way, the single instance of
y
is shared between the two arguments to +
.
Let’s look at a more complex example: the reduction of a supercombinator application.
f x = let y = * x x
in + y y
main = f 3
We start with the main
supercombinator:
which reduces to
which reduces to
We can see that both arguments to +
point to the same sub-expression (labelled y
) and
both arguments to *
point to the same instance of 3
. The outermost application is that
of +
, but it requires both of its arguments to be evaluated first. Both arguments are
the inner application of *
, which we can reduce. This gives us the following:
which reduces to
Reducing supercombinator applications
When reducing a supercombinator application, there are two things we must consider:
- The argument may contain redexes, so we want to avoid copying it.
- The redex may be shared, so we want to update it with its result after reduction.
To do this we construct a new instance of the supercombinator body, substituting a pointer to the argument in place of the function parameter. This avoids having to copy the argument, which may be large. Once we’ve reduced a redex, we overwrite the root of the redex with the result of the reduction. This ensures that any other references to it will not have to reduce it again.
There’s one case we need to be careful of, however. Consider the following Core program:
id x = x
main = let y = 4
in * (id y) y
After some reduction, the graph of this program looks like this:
Let’s consider the reduction of (id 4)
. If we were to naively update the root of the
redex with the result, we’d end up with the following.
We’ve duplicated 4
! If this was a large expression, we would be at risk of doing extra
work reducing it twice. To get around this, we can add a new type of node: an indirection
node. This node simply points to another node, and we can use it to update the root of
the redex without duplicating the result. Using an indirection node (marked by a #
) we
get the following:
We can then reduce this in the normal way. When we encounter an indirection node we simply
skip over it and look at its target. In one step this reduces to 16
.
Those are the basics of graph reduction. In the next section we’ll apply this theory to our first compiler: the template instantiation machine.
In this case we could call it a tree, but trees are just a specific type of graph. Later on we’ll see that certain transformations on the tree will make it no longer a valid tree. So for simplicity we’ll refer to everything as a graph.↩︎