How can one interpret an Abstract Syntax Tree without recursion?

https://softwareengineering.stackexchange.com/questions/387526

21-02-2021
|

Question

I am busy building an interpreter for a Rebol-like language. I have previously built a Lisp interpreter where I used recursion, but now I just want to use a stack and not rely on the recursion capability of the host language.

The parsing of the code works fine, and now I have an abstract syntax tree, i.e. the model of the code in memory. I can traverse the tree fine by just using a stack (i.e. without recursion) but the problem comes about when I need to resolve a value and then, e.g. assign the result of the evaluation.

So let's say the code, for example is

x: power 2 3

In my current block, I encounter x: which is to set the value of the word 'x'.

Then I look at the next element, which is a word which would evaluate to a function with two inputs, and I can see how to resolve that, but then I need to come back and assign the result of that evaluation to the new word.

Doing that with recursion is easy, because when the function call returns, it's return value is set as the value of the word. (The execution returns to the point where I was doing the assignment).

Without recursion, I need to know, when popping the stack after the function call, that I now need to know that I was busy with an assignment and complete that.

The irony is, when doing recursion, the host language obviously has this same problem, but they have resolved it already.

Solution

You need an operand stack (to store intermediate values) as well as a call stack where you place something to remind you what to do next when you return from a sub-activity. Most CPUs mix those two stacks, but some languages (for example, Forth and PostScript) keep them separate. Normally what gets placed on the call stack is a program counter value, but in a simple interpreter you might place the name of a function to execute when you return from a call.

OTHER TIPS

You can systematically calculate such an approach. For simplicity, I'll consider evaluating arithmetic expressions, but the transforms are completely mechanical. I'll use Haskell as the language.

data Expr = Const Int      -- This type represents the abstract syntax tree.
          | Add Expr Expr
          | Mul Expr Expr

-- A straight forward recursive implementation.
eval (Const n) = n
eval (Add l r) = eval l + eval r
eval (Mul l r) = eval l * eval r

Transformation 1: CPS transform

evalCPS (Const n) k = k n
evalCPS (Add l r) k = evalCPS l (\x -> evalCPS r (\y -> k (x + y)))
evalCPS (Mul l r) k = evalCPS l (\x -> evalCPS r (\y -> k (x * y)))

eval1 e = evalCPS e (\x -> x)

Already this has eliminated the need for a run-time stack. All the calls in this code are tail calls (assuming we treat + and * as primitive operations). If your implementation language supports higher-order functions, you can already use these continuations as your "stack", and you can rewrite the straight tail recursion as a while loop. However, we can apply another transformation to get a data structure which will look more like a stack.

Transformation 2: Defunctionalization

This is another mechanical transformation. We go through and replace each anonymous function (lambda) with a data constructor which will hold the free variables, and then write an apply function to interpret that data constructor as the body of the lambda it replaced. There are five lambdas, so we'll end up with five data constructors.

data Cont = Done                   -- This type represents the stack.
          | LeftAddCont Expr Cont
          | RightAddCont Int Cont
          | LeftMulCont Expr Cont
          | RightMulCont Int Cont

applyCont               Done n = n
applyCont  (LeftAddCont r k) x = evalD r (RightAddCont x k)
applyCont (RightAddCont x k) y = applyCont k (x + y)
applyCont  (LeftMulCont r k) x = evalD r (RightMulCont x k)
applyCont (RightMulCont x k) y = applyCont k (x * y)

evalD (Const n) k = applyCont k n
evalD (Add l r) k = evalD l (LeftAddCont r k)
evalD (Mul l r) k = evalD l (LeftMulCont r k)

eval2 e = evalD e Done

As is always the case when you defunctionalize code in continuation-passing style, the type of continuations is isomorphic to a list. So just refactoring the above gives:

data StackFrame = LeftAdd Expr
                | RightAdd Int
                | LeftMul Expr
                | RightMul Int

type Stack = [StackFrame]

consumeStack               [] n = n
consumeStack  (LeftAdd r:stk) x = evalS r (RightAdd x:stk)
consumeStack (RightAdd x:stk) y = consumeStack stk (x + y)
consumeStack  (LeftMul r:stk) x = evalS r (RightMul x:stk)
consumeStack (RightMul x:stk) y = consumeStack stk (x * y)

evalS (Const n) stk = consumeStack stk n
evalS (Add l r) stk = evalS l (LeftAdd r:stk)
evalS (Mul l r) stk = evalS l (LeftMul r:stk)

eval3 e = evalS e []

Finally, refactoring to a while-loop. We can think of the two arguments of evalS and of consumeStack as being mutable variables that are being updated as we go around a loop. Haskell doesn't have mutable variables or while-loops, so I'll switch to Python.

class Const:
    def __init__(self, n):
        self.value = n
        self.isConst = True

class Op:
    def __init__(self, l, r, isAdd):
        self.left = l
        self.right = r
        self.isConst = False
        self.isAdd = isAdd

def Add(l, r): return Op(l, r, True)

def Mul(l, r): return Op(l, r, False)

class LeftFrame:
    def __init__(self, expr, isAdd):
        self.expr = expr
        self.isRight = False
        self.isAdd = isAdd

class RightFrame:
    def __init__(self, n, isAdd):
        self.value = n
        self.isRight = True
        self.isAdd = isAdd

def eval(expr):
    e   = expr
    stk = []
    n   = None
    while n is None or len(stk) != 0:
        if not e.isConst:
            stk.append(LeftFrame(e.right, e.isAdd))
            e = e.left
            continue

        n = e.value # e is a Const

        while len(stk) != 0:
            f = stk.pop()
            if f.isRight:
                if f.isAdd:
                    n = n + f.value
                else:
                    n = n * f.value
            else:
                e = f.expr
                stk.append(RightFrame(n, f.isAdd))
                break
    return n

# (2+3)*((4*4)+5)
print(eval(Mul(Add(Const(2), Const(3)), Add(Mul(Const(4), Const(4)), Const(5)))))

If you really wanted to, you could add a Boolean to track whether you were in the evalS loop or the consumeStack loop and combine the two while-loops into one.

Of course, you don't have to write out each of the intermediate steps or be so mechanical about it.

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange