What is the difference between implementing a compiler and an interpreter?

https://stackoverflow.com/questions/475223

19-08-2019
|

Question

I've read the whole Dragon Book recently (just for fun, I'm not really planning to implement an actual compiler), and I was left with this big question dangling in my head.

What is different between implementing a compiler and an interpreter?

To me a compiler is made up of:

Lexer
Parser (which builds the syntax tree)
Generate Intermediate code (like 3 address code)
Do all these crazy things to optimize if you want :-)
Generate "assembly" or "native code" from the 3 address code.

Now, obviously, the interpreter also has the same lexer and parser as the compiler.
But what does it do after that?

Does it "read" the syntax tree and execute it directly? (kind of like having an instruction pointer pointing to the current node in the tree, and the execution is one big tree traversal plus the memory management for the call stack) (and if so, how does it do it? I'm hoping the execution is better than a huge switch statement that checks what type of node it is)
Does it generate 3 address code and interpret that? (if so, how does it do it? Again, I'm looking for something more elegant than a mile long switch statement)
Does it generate real native code, load it into memory, and make it run? (at which point I'm guessing it's not an interpreter anymore, but more like a JIT compiler)

Also, at which point does the concept of "virtual machine" cut in? What do you use a virtual machine for in a language? (to be clear about my level of ignorance, to me a virtual machine is VMWare, I have no idea how the concept of VM applies to programming languages / executing programs).

As you can see, my question is quite broad. I'm mostly looking for not only which method is used but mostly to first understand the big concepts, and then get into how it works in detail. I want the ugly, raw details. Obviously, this is more a quest for references to things to read rather than expecting you to answer all these details in here.

Thanks!
Daniel

EDIT: Thank you for your answers so far. I realized my title was misleading though. I understand the "functional" difference between a compiler and an interpreter.
What i'm looking for is the difference as to how you implement an interpreter, vs a compiler.
I understand now how a compiler is implemented, the question is how an interpreter differs from that.

For example: VB6 is clearly both a compiler and an interpreter. I understand now the compiler part. However, I can not grasp how, when running inside the IDE, it could let me stop the program at any arbitrary point, change the code, and resume execution with the new code. That's just one tiny example, it's not the answer i'm looking for. What i'm trying to understand, as I explain below, is what happens after I have a parse tree. A compiler will generate new code from it in the "target" language. What does an interpreter do?

Thank you for your help!

Solution

short answer:

a compiler converts source-code into an executable format for later execution
an interpreter evaluates source-code for immediate execution

there is a great deal of leeway in how either are implemented. It is possible for an interpreter to generate native machine code and then execute that, while a compiler for a virtual machine may generate p-code instead of machine code. Threaded interpreted languages like Forth look up keywords in a dictionary and execute their associated native-code function immediately.

compilers can generally optimize better because they have more time to study the code and produce a file for later execution; interpreters have less time to optimize because they tend to execute the code "as is" upon first sight

an interpreter that optimized in the background, learning better ways to execute the code is also possible

summary: the difference really comes down to 'prepare the code for later execution' or 'execute the code right now'

OTHER TIPS

A compiler is a program that translates a program in one programming language to a program in another programming language. That's it - plain and simple.

An interpreter translates a programming language into its semantic meaning.

An x86 chip is an interpreter for x86 machine language.

Javac is a compiler for java to the java virtual machine. java, the executable application, is an interpreter for the jvm.

Some interpreters share some elements of compilation in that they may translate one language into another internal language that is easier to interpret.

Interpreters usually, but not always, feature a read-eval-print loop.

A program is a description of work you want done.

A compiler converts a high-level description into a simpler description.

An interpreter reads a description of what to do and does the work.

Some interpreters (e.g. Unix shells) read the description one small piece at a time and act on each piece as they see it; some (e.g. Perl, Python) read the entire description, internally convert it to a simpler form and then act on that.
Some interpreters (e.g. Java's JVM, or a Pentium 4 chip) only understand a very simple description language that is too tedious for humans to work with directly, so humans use compilers to convert their high-level descriptions to this language.

Compilers never do the work. Interpreters always do the work.

Both have much in common (eg lexical parser) and there is disagreement on the difference. I look at this way:

The classical definition would be that a compiler parses and translates a stream of symbols into a stream of bytes that can be run by the CPU whereas an interpreter does the same thing but translates them a form that must be executed on a piece of software (eg JVM, CLR).

Yet people call 'javac' a compiler so the informal definition of a compiler is something that must be done to source code as a separate step whereas interpreters have no 'build' step (eg PHP, Perl).

It's not as clear cut as it used to be. It used to be build a parse tree, bind it, and execute it (often binding at the last second).

BASIC was mostly done this way.

You could claim that things that run bytecode (java/.net) without doing a JIT are interpriters - but not in the traditional sense since you still have to 'compile' to bytecode.

The old school difference was: If it generates CPU code it's a compiler. If you run it directly in your editing environment and can interact with it while editing, it's an interpriter.

That was far less formal than the actual Dragon book - but I hope it's informative.

If my experience indicates anything;

Interpreters don't try to reduce/process AST further, each time a block of code is referenced, relevant AST node is traversed and executed. Compilers traverse a block at most several times to generate executable code in a determinate place and be done with it.
Interpreters' symbol table keeps values and referenced while execution, compilers' symbol table keeps locations of variables. There is no such thing symbol table while execution.

In shot the difference may be as simple as

case '+':
    symtbl[var3] = symtbl[var1] + symtbl[var2];
    break;

between,

case '+':
    printf("%s = %s + %s;",symtbl[var3],symtbl[var1],symtbl[var2]);
    break;

(It doesn't matter if you target another language or (virtual) machine instructions.)

In regard to this part of your question, which the other answers haven't really addressed:

Also, at which point does the concept of "virtual machine" cut in? What do you use a virtual machine for in a language?

Virtual machines like the JVM or the CLR are a layer of abstraction that allow you to reuse JIT compiler optimization, garbage collection and other implementation details for completely different languages that are compiled to run on the VM.

They also help you make the language specification more independent from the actual hardware. For example, while C code is theoretically portable, you constantly have to worry about things like endianness, type size and variable alignment if you actually want to produce portable code. Whereas with Java, the JVM is very clearly specified in these regards, so the language designer and its users don't have to worry about them; it's the job of the JVM implementer to implement the specified behaviour on the actual hardware.

Once a parse-tree is available, there are several strategies:

1) directly interpret the AST (Ruby, WebKit's original interpreter) 2) code transformation -> into byte codes or machine code

To achieve Edit-and-Continue, the program counter or instruction pointer has to be recalculated and moved. This requires cooperation from the IDE, because code may have been inserted before or after the little yellow arrow.

One way this could be done is to embed the position of the program counter in the parse tree. For instance, there might be a special statement called "break". The program counter only needs to be positioned after the "break" instruction to continue running.

In addition, you have to decide what you want to do about the current stack frame (and variables on the stack). Perhaps popping the current stack, and copying the variables over, or keeping the stack, but patch in a GOTO and RETURN to the current code.

Given your list of steps:

Lexer
Parser (which builds the syntax tree)
Generate Intermediate code (like 3 address code)
Do all these crazy things to optimize if you want :-)
Generate "assembly" or "native code" from the 3 address code.

A very simple interpreter (like early BASICs or TCL) would only perform steps one and two one line at a time. And then throw away most of the results while proceeding to the next line to be executed. The other 3 steps would never be performed at all.

If you're looking for a book, Structure and Interpretation of Computer Programs ("the Wizard book") is a good place to start with interpreter concepts. You're only ever dealing with Scheme code, which can be traversed, evaluated, and passed around as if it were an AST.

Also, Peter Norvig has a short example explaining the main idea using Python (with many more examples in the comments), and here is another small example on Wikipedia.

Like you said, it's a tree-traversal, and at least for call-by-value it's a simple one: whenever you see an operator, evaluate the operands fist, then apply the operator. The final value returned is the result of the program (or the statement given to an REPL).

Note that you don't always have to do the tree traversal explicitly: you could generate your AST in such a way that accepts a visitor (I think SableCC does this), or for very small languages, like the small arithmetic grammars used to demonstrate parser generators, you can just evaluate the result during parsing.

In order to support declarations and assignments, you need to keep an environment around. Just as you'd evaluate "plus" by adding the operands, you'd evaluate the name of a function, variable, etc., by looking it up in the environment. Supporting scope means treating the environment like a stack and pushing and popping things at the right time. In general, how complicated your interpreter is depends on which language features you mean to support. For instance, interpreters make garbage collection and introspection possible.

For VMs: plinth and j_random_hacker described computer hardware as a kind of interpreter. The reverse is also true -- interpreters are machines; their instructions happen to be higher-level than those of a real ISA. For VM-style interpreters, the programs actually resemble machine code, albiet for a very simple machine. Java bytecode uses just a few "registers," one of which holds a program counter. So a VM interpreter is more like a hardware emulator than the interpreters in the examples I linked above.

But note that, for speed reasons, the default Oracle JVM works by translating runs of Java bytecode instructions into x86 instructions ("just in time compilation").

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow