How to add fields that only cache something to ADT?

https://stackoverflow.com/questions/13098448

14-07-2021
|

Question

Often I'm in the need of adding fields to an ADT that only memoize some redundant information. But I haven't figured out completely how to do it nicely and efficiently.

The best way to show the problem is to make an example. Suppose we're working with untyped lambda terms:

type VSym = String

data Lambda = Var VSym 
            | App Lambda Lambda
            | Abs VSym Lambda

And from time to time we need to compute the set of free variables of a term:

fv :: Lambda -> Set VSym
fv (Var v)    = Set.singleton v
fv (App s t)  = (fv s) `Set.union` (fv t)
fv (Abs v t)  = v `Set.delete` (fv t)

Soon we realize that repeated computations of fv are a bottleneck of our application. We would like to add it to the data type somehow. Like:

data Lambda1 = Var (Set VSym) VSym
             | App (Set VSym) Lambda Lambda
             | Abs (Set VSym) VSym Lambda

But it makes the definition quite ugly. Almost (Set VSym) takes more space than all the rest. Moreover, it breaks pattern matching in all functions that use Lambda. And to make things worse, if we later decide to add some other memoizing field, we'll have to rewrite all patterns again.

How to design a general solution that allows adding such memoizing fields easily and unobtrusively? I'd like to reach the following goals:

The data definition should look as close as possible to the original, so that it's easily readable and understandable.
Pattern matches too should remain simple and readable.
Adding a new memoizing field later should not break existing code, in particular:
- not to break existing patterns,
- not to require changes where the function we want to memoize was used (like code that used fv in this example).

I'll describe my current solution: To keep the data definition and pattern matches as little cluttered as possible, let's define:

data Lambda' memo = Var memo VSym 
                  | App memo (Lambda' memo) (Lambda' memo)
                  | Abs memo VSym (Lambda' memo)
type Lambda = Lambda' LambdaMemo

where the data to be memoized is defined separately:

data LambdaMemo = LambdaMemo { _fv :: Set VSym, _depth :: Int }

Then a simple function that retrieves the memoized part:

memo :: Lambda' memo -> memo
memo (Var c _)   = c
memo (App c _ _) = c
memo (Abs c _ _) = c

(This could be eliminated by using named fields. But then we'd have to name all the other fields as well.)

This allows us to pick specific parts from the memoize, keeping the same signature of fv as before:

fv :: Lambda -> Set VSym
fv = _fv . memo

depth :: Lambda -> Int
depth = _depth . memo

Finally, we declare these smart constructors:

var :: VSym -> Lambda
var v = Var (LambdaMemo (Set.singleton v) 0) v

app :: Lambda -> Lambda -> Lambda
app s t = App (LambdaMemo (fv s `Set.union` fv t) (max (depth t) (depth s))) s t

abs :: VSym -> Lambda -> Lambda
abs v t = Abs (LambdaMemo (v `Set.delete` fv t) (1 + depth t)) v t

Now we can efficiently write things that mix pattern matching with reading the memoized fields like

canSubstitute :: VSym -> Lambda -> Lambda -> Bool
canSubstitute x s t
  | not (x `Set.member` (fv t))
      = True -- the variable doesn't occur in `t` at all
canSubstitute x s t@(Abs _ u t')
  | u `Set.member` (fv s)
      = False
  | otherwise
      = canSubstitute x s t'
canSubstitute x s (Var _ _)
      = True
canSubstitute x s (App _ t1 t2)
      = canSubstitute x s t1 && canSubstitute x s t2

This seems to solve:

Pattern matching is still quite reasonable.
If we add a new memoizing field it won't disrupt existing code.
If we decide to memoize a function with signature Lambda -> Something we can easily add it as a new memoizing field.

What I still don't like about this design:

The data definition isn't so bad, but still placing memo everywhere clutters it considerably.
We need to have smart constructors for constructing values but we use the regular constructors for pattern matching. This is not so bad, we simply add one _, but having the same signature for constructing and deconstructing would be nice. I suppose Views or Pattern Synonyms would solve it.
The computation of the memoized fields (free variables, depth) is tightly coupled to the smart constructors. As it's reasonable to assume that those memoized functions will be always catamorphisms, I believe this could be solved to some extent by tools like the fixpoint package.

Any ideas how to improve it? Or are there better ways to solve such a problem?

Solution

I think all of your goals can be met by using plain old memoization in the function instead of by caching results in the ADT itself. Just a couple weeks ago, I released the stable-memo package, which should help here. Checking over your criteria, I don't think we could do any better than this:

Your data definition doesn't change at all.
Pattern matching doesn't change, either.
Existing code doesn't have to change merely because you write more memoized functions.
- No existing patterns get broken.
- No existing memoized functions get broken.

Using it is very simple. Just apply memo to any function you want to memoize, making sure that you use the memoized version of the function everywhere, even in recursive calls. Here's how to write the example you used in your question:

import Data.StableMemo

type VSym = String

data Lambda = Var VSym 
            | App Lambda Lambda
            | Abs VSym Lambda

fv :: Lambda -> Set VSym
fv = memo go
  where
    go (Var v)   = Set.singleton v
    go (App s t) = fv s `Set.union` fv t
    go (Abs v t) = v `Set.delete` fv t

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow