Here's a simple case study for the kind of thing I want my metatheory to support, and how it relates to pattern matching and memoization.

Reminds myself that this has a name: Director strings

A common technique when optimizing interpreters is to cache the free variables used in an expression, so you can ignore the variables that are no longer relevant.

It literally takes three lines of code to implement if you believe in functors:

data SyntaxF a
  = Var String
  | Lam String a
  | Add a a
  | Nat Natural

freeVars :: Mu SyntaxF -> Set String
-- Case for variable references
freeVars (Roll (Var v)) = singleton v
-- Case for binders
freeVars (Roll (Lam v e)) = delete v (freeVars e)
-- Case for anything else, as a functor
freeVars (Roll fe) = foldMap freeVars (fe :: SyntaxF _)

But the whole point of it being an optimization is that it needs to be calculated ahead of time, so you aren't paying the cost of the traversal when you want to query it.


If you're doing this in Haskell, you have to make a new datatype or constructor that caches this information, and now it infects all of your pattern matching ever. Everything that deals with expressions now has to worry about free variables, even though it's two lines of code to compute it in general.

data DirStr f a = Cache (Set String) (f a)

cacheFreeVars :: SyntaxF (Mu (DirStr SyntaxF)) -> Mu (DirStr SyntaxF)

So the way I see it, there are two solutions:

  1. Add support for constructing/destructing/preserving expressions that operate on a different type, so it just desugars to the right operations on the type you would right out in Haskell with none of the work
  2. Add support for memoizing pure functions on expressions, so that they are only computed once for a given node

Solution 1

eval (Add (Nat v1) (Nat v2)) = Nat (v1 + v2)
eval (Add (Nat v) r) = eval (Add r (Nat v))
eval (Add l r) = Add (eval l) (eval r)

becomes

eval (Cache _ (Add (Cache _ (Nat v1)) (Cache _ (Nat v2))) =
  Cache empty (Nat (v1 + v2))
eval (Cache _ (Add (Cache _ (Nat v)) (Cache vs r)) =
  eval (Cache vs (Add (Cache vs r) (Cache empty (Nat v))))
eval (Cache _ (Add l r)) =
  let l'@(Cache ls _) = eval l in
  let r'@(Cache rs _) = eval r in
  Cache (ls <> rs) (Add l' r')

where anytime you need to construct a cached version from a non cached version, cacheFreeVars is inserted and partially evaluated

(Hehe I'm already omitting Roll from Mu, remember I said that pattern matching in Haskell is too literal and does not scale well?)

Solution 2

Solution 2 would literally be the same in Haskell, since you can't just tag additional data onto a type (and it is lazy).

But in a runtime like JavaScript, you can tag additional data onto an object, say with symbols.

const dirStr = new Symbol("DirStr");

function freeVars(e) {
  if (e[dirStr]) return e[dirStr];
  return e[dirStr] = freeVarsImpl(e);
}

Now you're relying on referential identity for the memoization to actually work, but the goal of the metatheory would be to preserve referential identity. And in fact, having access to director strings would allow you to skip substitutions and thus preserve referential identity more.

Such a simple thing, but the most important part is that it would be handled implicitly by the metatheory, and not an onerous manual rewrite that impacts the whole codebase.


You must log in to comment.