How hard is it to write an interpreted language assuming you have an AST?

https://stackoverflow.com/questions/5008650

14-11-2019
|

Question

I already have a parser for a language I've been working on. Is making it interpreted difficult? I was thinking its simple. The parsing and syntax check is done. I just have a tree of objects. Everytime an object is created I create a branch and store its type (string, int, float, class/obj). Then everytime a new member is added to the object I create a branch and repeat.

I try to make it sound simple. I still need to check of object A can be added to object B and such.

Is it actually fairly simple after AST and syntax check is done or is there still a lot more work?

Solution

Typically you need to build up symbol tables and do type checking. For some langauges, you can do this on the fly; for others, I think pretty much have to the name resolution and type checking pretty much first or you won't be able to interpret it well (C++ comes to mind).

Once you have symbol tables constructed, you can pretty much write an interpreter by walking the tree in exeuction order and doing what the operators say. Basic arithmetic is pretty easy. String and dynamic storage management is harder; you to figure out how you are going to handle storage allocation and deallocatoin, and for langauages that manage storage, you'll end up have to implement some kind of garbage collector. Life gets complicated quickly at this point.

You'll likely discover your langauage needs features you didn't consider. Exception handling? Multiple asignments? Local scopes? Lambdas? Closures? You'll find out pretty quickly how much modern languages have that make them useful.

As you start to write more complicated programs, you'll need a debugger. Breakpoints? Single step? Variable inspection? Update? Start at arbitrary places? Read-eval-print loop?

You still need to tie you language to external libraries; most people want to talk to consoles and files; do you want buffered files or are you OK with 1 character at a time and the corresponding performance hit? You'll get to argue with characater representations (7 bit ascii? 8 bit? UTF8 with non-unit wide characters? Full Unicode?) and standard support libraries (string concatentation, search, number conversions [including accurate floating point conversions both ways], large number arithmetic, floating point traps, .... The list of issues gets pretty long if you want a useful programming language.

The interpreter core will likely be pretty small. Youll find the other stuff probably burns one or two orders of magnitude more effort. Somewhere in here, if you want anybody to use the langauge, you need to document all the choices you made. And heaven help you if you change the interpreter a little bit after somebody gets a big application running.

Next, somebody will complain about performance. Now you get to tune your implementation, and start regrettng the fact that you wrote an interpreter instead of a compiler.

Enjoy. If you have an AST, you've barely scratched the surface. If you do take this on, you'll learn to really appreciate what modern languages provide out of the box, and how much effort it to took to provide it.

OTHER TIPS

It depends on how complex a language you want to write an interpreter for and your choice of tools. Simple interpreters are simple.

Consider the following as the definition on an AST in Haskell for a language that supports higher order functions and numbers:

data Exp = Lam String Exp 
         | App Exp Exp 
         | Var String 
         | Num Int

Now you can write an interpreter for it as the simple "eval" function:

eval (App e1 e2) env = (unF (eval e1 env)) (eval e2 env)
eval (Lam x e) env   = F (\v -> (eval e ((x,v):env)))
eval (Num n) env     = N n
eval (Var x) env     = case (lookup x env) of 
                         Just v -> v
                         Nothing -> error ("Unbound variable " ++ x)

And that's it. The few boring supporting definitions are as follows.

data Val = F (Val -> Val)  | N Int
unF (F x) = x
instance Show Val where 
    show (F _) = "<procedure>"
    show (N n) = show n

In other words, if you copy paste the above three blocks of code into a Haskell source file you will have a working interpreter, which you can test using ghci as follows:

*Main> eval (App (Lam "x" (Var "x")) (Num 1)) []
1
*Main> eval (Var "x") []
*** Exception: Unbound variable x

You could read about creating languages in the classic SICP or EOPL or the little book. If you wish to build a typed language, you may have to read some more.

That said, if you are going to build languages, might I strongly recommend lots of reading first. For one, its very rewarding. And secondly too many hideous languages have been inflicted on the world by people who don't know how to create languages (many of which have become very popular for various historical reasons) and we are stuck with them.

I'd say that the difficult part (and the funniest part actually) begins after you have the AST done.

Have a look at LLVM, it has bindings for a lot of languages (I used only C++ and Haskell, I can't tell for other languages), and it should help you writing a just in time compiler for your language. Actually, LLVM makes it easier to write a compiler than an interpreter !

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow