When designing a byte code interpreter, is there a consensus these days on whether stack or three address format (or something else?) is better? I'm looking at these considerations:

  1. The objective language is a dynamic language fairly similar to Javascript.

  2. Performance is important, but development speed and portability are more so for the moment.

  3. Therefore the implementation will be strictly an interpreter for the time being; a JIT compiler may come later, resources permitting.

  4. The interpreter will be written in C.

有帮助吗?

解决方案

Take a look at the OCaml bytecode interpreter - it's one of the fastest of its kind. It is pretty much a stack machine, translated into a threaded code on loading (using the GNU computed goto extension). You can generate a Forth-like threaded code as well, should be relatively easy to do.

But if you're keeping a future JIT compilation in mind, make sure that your stack machine is not really a full-featured stack machine, but an expression tree serialisation form instead (like .NET CLI) - this way you'd be able to translate your "stack" bytecode into a 3-address form and then into an SSA.

其他提示

Read The evolution of Lua and The implementation of Lua 5.0 for how Lua changed from a stack-based virtual machine to a register-based virtual machine and why it gained performance doing it.

Experiments done by David Gregg and Roberto Ierusalimschy have shown that a register-based bytecode works better than a stack-based bytecode because fewer bytecode instructions (and therefore less decoding overhead) are required to do the same tasks. So three-address format is a clear winner.

I don't have much (not really any) experience in this area, so you might want to verify some of the following for yourself (or maybe someone else can correct me where necessary?).

The two languages I work with most nowadays are C# and Java, so I am naturally inclined to their methodologies. As most people know, both are compiled to byte code, and both platforms (the CLR and the JVM) utilize JIT (at least in the mainstream implementations). Also, I would guess that the jitters for each platform are written in C/C++, but I really don't know for sure.

All-in-all, these languages and their respective platforms are pretty similar to your situation (aside from the dynamic part, but I'm not sure if this matters). Also, since they are such mainstream languages, I'm sure their implementations can serve as a pretty good guide for your design.


With that out of the way, I know for sure that both the CLR and the JVM are stack-based architectures. Some of the advantages which I remember for stack-based vs register-based are

  1. Smaller generated code
  2. Simpler interpreters
  3. Simpler compilers
  4. etc.

Also, I find stack-based to be a little more intuitive and readable, but that's a subjective thing, and like I said before, I haven't seen too much byte code yet.

Some advantages of the register-based architecture are

  1. Less instructions must be executed
  2. Faster interpreters (follows from #1)
  3. Can more readily be translated to machine code, since most commonplace hardwares are register based
  4. etc.

Of course, there are always ways to offset the disadvantages for each, but I think these describe the obvious things to consider.

If you have JIT in your mind then bytecodes is the only option.

Just in case you can take a look on my TIScript: http://www.codeproject.com/KB/recipes/TIScript.aspx and sources: http://code.google.com/p/tiscript/

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top