Question

We are currently working on the JIT compilation part of our own Java Virtual Machine implementation. Our idea was now to do a simple translation of the given Java bytecode into opcodes, writing them to executable memory and CALLing right to the start of the method.

Assuming the given Java code would be:

int a = 13372338;
int b = 32 * a;
return b;

Now, the following approach was made (assuming that the given memory starts at 0x1000 & the return value is expected in eax):

0x1000: first local variable - accessible via [eip - 8]
0x1004: second local variable - accessible via [eip - 4]
0x1008: start of the code - accessible via [eip]

Java bytecode | Assembler code (NASM syntax)
--------------|------------------------------------------------------------------
              | // start
              | mov edx, eip
              | push ebx
              |         
              | // method content
ldc           | mov eax, 13372338
              | push eax
istore_0      | pop eax
              | mov [edx - 8], eax
bipush        | push 32
iload_0       | mov eax, [edx - 8]
              | push eax
imul          | pop ebx
              | pop eax
              | mul ebx
              | push eax
istore_1      | pop eax
              | mov [edx - 4], eax
iload_1       | mov eax, [edx - 4]
              | push eax
ireturn       | pop eax
              |         
              | // end
              | pop ebx
              | ret

This would simply use the stack just like the virtual machine does itself. The questions regarding this solution are:

  • Is this method of compilation viable?
  • Is it even possible to implement all the Java instructions this way? How could things like athrow/instanceof and similar commands be translated?
Was it helpful?

Solution

This method of compilation works, is easy to get up and running, and it at least removes interpretation overhead. But it results in pretty large amounts of code and pretty awful performance. One big problem is that it transliterates the stack operations 1:1, even though the target machine (x86) is a register machine. As you can see in the snippet you posted (as well as any other code), this always results in several stack manipulation opcodes for every single operation, so it uses the registers - heck, the the whole ISA - about as ineffectively as possible.

You can also support complicated control flow such as exceptions. It's not very different from implementing it in an interpreter. If you want good performance you don't want to perform work every time you enter or exit a try block. There are schemes to avoid this, used by both C++ and other JVMs (keyword: zero-cost or table-driven exception handling). These are quite complex and complicated to implement, understand and debug, so you should go with a simpler alternative first. Just keep it in mind.

As for the generated code: The first optimization, one which you'll almost definitely will need, is converting the stack operations into three address code or some other representation that uses registers. There are several papers on this and implementations of this, so I won't elaborate unless you want me to. Then, of course, you need to map these virtual registers onto physical registers. Register allocation is one of the most well-researched topics in compiler constructions, and there are at least half a dozen heuristics that are reasonably effective and fast enough to use in a JIT compiler. One example off the top of my head is linear scan register allocation (specifically creates for JIT compilation).

Beyond that, most JIT compilers focused on performance of the generated code (as opposed to quick compilation) use one or more intermediate formats and optimize the programs in this form. This is basically your run of the mill compiler optimization suite, including veterans like constant propagation, value numbering, re-association, loop invariant code motion, etc. - these things are not only simple to understand and implement, they've also been described in thirty years of literature up to and including textbooks and Wikipedia.

The code you'll get with the above will be pretty good for straigt-line code using primitives, arrays and object fields. However, you won't be able to optimize method calls at all. Every method is virtual, which means inlining or even moving method calls (for example out of a loop) is basically impossible except in very special cases. You mentioned that this is for a kernel. If you can accept using a subset of Java without dynamic class loading, you can do better (but it'll be nonstandard) by assuming the JIT knows all classes. Then you can, for example, detect leaf classes (or more generally methods which are never overriden) and inline those.

If you do need dynamic class loading, but expect it to be rare, you can also do better, though it takes more work. The advantage is that this approach generalizes to other things, like eliminating logging statements completely. The basic idea is specializing the code based on some assumptions (for example, that this static does not change or that no new classes are loaded), then de-optimizing if those assumptions are violated. This means you'll sometimes have to re-compile code while it is running (this is hard, but not impossible).

If you go further down this road, its logical conclusion is trace-based JIT compilation, which has been applied to Java, but AFAIK it didn't turn out to be superior to method-based JIT compilers. It's more effective when you have to make dozens or hundreds of assumptions to get good code, as it happens with highly dynamic languages.

OTHER TIPS

Some comments about your JIT compiler (I hope I do not write things "delnan" already wrote):

Generic comments

I'm sure "real" JIT compilers work similar to your one. However you could do some optimization (example: "mov eax,nnn" and "push eax" could be replaced by "push nnn").

You should store local variables on the stack; typically "ebp" is used as local pointer:

push ebx
push ebp
sub esp, 8 // 2 variables with 4 bytes each
mov ebp, esp
// Now local variables are addressed using [ebp+0] and [ebp+4]
  ...
pop ebp
pop ebx
ret

This is necessary because functions may be recursive. Storing a variable at a fixed location (relative to EIP) would cause the variables to behave like "static" ones. (I'm assuming you are not compile a function multiple times in the case of a recursive function.)

Try/Catch

To implement Try/Catch your JIT compiler does not only have to look at the Java Bytecode but also at the Try/Catch information that is stored in a separate Attribute in the Java class. Try/catch can be implemented in the following way:

  // push all useful registers (= the ones, that must not be destroyed)
 push eax
 push ebp
  ...
  // push the "catch" pointers
 push dword ptr catch_pointer
 push dword ptr catch_stack
  // set the "catch" pointers
 mov catch_stack,esp
 mov dword ptr catch_pointer, my_catch
  ... // some code
  // Here some "throw" instruction...
 push exception
 jmp dword ptr catch_pointer
  ... //some code
  // End of the "try" section: Pop all registers
 pop dword_ptr catch_stack
  ...
 pop eax
  ...
  // The "catch" block
my_catch:
 pop ecx // pop the Exception from the stack
 mov esp, catch_stack // restore the stack
  // Now restore all registers (same as at the end of the "try" section)
 pop dword_ptr catch_stack
  ...
 pop eax
 push ecx // push the Exception to the stack

In a multi-thread environment each thread requires its own catch_stack and catch_pointer variable!

Specific exception types can be handled by using an "instanceof" the following way:

try {
    // some code
} catch(MyException1 ex) {
    // code 1
} catch(MyException2 ex) {
    // code 2
}

... is actually compiled like this ...:

try {
    // some code
} catch(Throwable ex) {
    if(ex instanceof MyException1) {
        // code 1
    }
    else if(ex instanceof MyException2) {
        // code 2
    }
    else throw(ex); // not handled!
}

Objects

A JIT compiler of a simplified Java virtual machine not supporting objects (and arrays) would be quite easy but the objects in Java make the virtual machine very complex.

Objects are simply stored as pointers to the object on the stack or in the local variables. Typically JIT compilers will be implemented like this: For each class a piece of memory exists that contains information about the class (eg. which methods exist and at which address the assembler code of the method is located etc.). An object is some piece of memory that contains all instance variables and a pointer to the memory containing information about the class.

"Instanceof" and "checkcast" could be implemented by looking at the pointer to the memory containing information about the class. This information may contain a list of all parent classes and implemented interfaces.

The main problem of objects however is the memory management in Java: Unlike C++ there is a "new" but no "delete". You have to check how often an object is used. If an object is no longer used it must be deleted from memory and the destructor must be called.

The problems here are local variables (the same local variable may contain an object or a number) and try/catch blocks (the "catch" block must take care about the local variables and the stack (!) containing objects before restoring the stack pointer).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top