Recursion with Context
Whilst writing a source code formatter for Lam I stumbled into the problem of when to add parentheses to expressions. I was printing Haskell style type signatures, which sometimes need parentheses to indicate precedence. For example,
f a b is interpreted as
f(a, b) whereas
f (a b) is interpreted as
When printing a type signature you want to add just enough parentheses that the output is unambiguous, but no more than that. This turned out to be tricky! I tried several approaches, each revealing edge cases I hadn’t thought of, until I came up with a more structured approach which I’m fairly happy with. I have a hunch this is quite a general technique that is used elsewhere, but this was the first time I’ve encountered a problem that seems to be a good fit for it.
Type signatures in Lam are represented by this data type (simplified a bit):
The reason for this structure is because it fits in well with another part of the compiler, so I didn’t want to change it just to make printing easier.
Now there are two major syntactic structures in Lam type signatures that you need to deal with: function types (e.g.
a -> b) and type applications (e.g.
f a). Type applications are directly represented by the
:@: constructor, so
f a is
TyVar "f" :@: TyVar "a". There’s no dedicated constructor for function types - instead they’re represented as an application of the type
TyArr to two other types.
a -> b is:
This is a bit clunky to construct, so I have a helper function
fn which is used infix to generate a function type.
We need to add parentheses to separate nested function types and type applications from each other. Here are some examples:
The rules for when to add parentheses aren’t immediately obvious, but after a bit of experimentation I came up with this:
- applications and arrows at the top level don’t get parenthesised
- arrows on the left of arrows get parenthesised
- arrows on the right of arrows don’t get parenthesised
- arrows on either side of applications get parenthesised
- applications on the left of applications don’t get parenthesised
- applications on the right of applications get parenthesised
- applications on either side of arrows don’t get parenthesised
The difficulty with implementing this in a typical recursive way is that there are five different “states” you can be in, and you need to do different things depending on the state and the element you’re looking at. So I tried capturing the “state” explicitly as a sort of surrounding context.
The idea here is that the context tells you where you are in relation to the wider expression, so you can make decisions based on that even though you can’t “see” any of the wider expression at the time. An example should make things clearer, so let’s walk through how this is used.
To print a type, I pass it to my function along with an initial context of
print' examines the context and the type and uses the combination of the two to determine what to do.
We start by matching the case of a function arrow. We print either side, separated by an arrow.
The left hand side (
a) gets the context
ArrL and the RHS gets
We do a similar thing with applications.
That’s all we need to do to ensure that the correct context is propagated through our AST. The next eight patterns are just translations of the rules we wrote above. We use a helper function
parens which wraps its argument in parentheses.
Arrows on the left of arrows get parenthesised:
Notice that in the recursive call we reset the context to
Root, because we’ve gone inside parentheses. This ensures we don’t add unnecessary parentheses such as
((a -> b)) -> a -> b.
Arrows on the right of arrows don’t get parenthesised:
Arrows on either side of applications get parenthesised:
Applications on the left of applications don’t get parenthesised:
Applications on the right of applications get parenthesised:
Applications on either side of arrows don’t
Finally we have the basic case: type variables. We don’t care about the context when printing these.
And that’s it. Quite straightforward and (compared to my earlier attempts) very understandable!
The key idea here is to represent the surrounding context as a data type and pass that down through your recursive calls. This lets you make decisions based on the larger structure simply and efficiently (we traverse the AST in one pass).
I’m certain this technique isn’t new, but it’s the first time I’ve used it and in this case it seems like a very good fit.
I’m using String as the return type here for simplicity but in reality this was an abstract document type from a pretty printing library.↩