JNI vs. JNA performance

Question 1

JNA is much slower than JNI, but much easier. If performance is not an issue use JNA.

Using direct buffers have the advantage that the most critical operations don't use JNI or JNA and are thus faster. They use intrinsic when means they get turned into single machine code instructions.

If Java code is significantly slower than C it is likely the code hasn't been optimised enough. Generally the GPU should be doing all the work so if Java is a bit slower this shouldn't make much difference.

e.g. if you spend 99% of the time in the GPU and Java takes twice as long the total will be 99+2% or 1% slower.

Question 2

I developed a simple dll and put an empty function which does nothing. Then I called that function from dll with JNA and also JNI, so I tried to calculate cost of calling them. When looking performance after many calls, JNI was 30-40 times faster than JNA.

Question 3

From JNA's official FAQ:

How does JNA performance compare to custom JNI?

JNA direct mapping can provide performance near that of custom JNI. Nearly all the type mapping features of interface mapping are available, although automatic type conversion will likely incur some overhead.

The calling overhead for a single native call using JNA interface mapping can be an order of magnitude (~10X) greater time than equivalent custom JNI (whether it actually does in the context of your application is a different question). In raw terms, the calling overhead is on the order of hundreds of microseconds instead of tens of microseconds. Note that that's the call overhead, not the total call time. This magnitude is typical of the difference between systems using dynamically-maintained type information and systems where type information is statically compiled. JNI hard-codes type information in the method invocation, where JNA interface mapping dynamically determines type information at runtime.

You might expect a speedup of about an order of magnitude moving to JNA direct mapping, and a factor of two or three moving from there to custom JNI. The actual difference will vary depending on usage and function signatures. As with any optimization process, you should determine first where you need a speed increase, and then see how much difference there is by performing targeted optimizations. The ease of programming everything in Java usually outweighs small performance gains when using custom JNI.

Question 4

The heavy number crunching is done in C/GPU, all your Java <--> C interface does is shuffle data in/out. I'd be suprised if this is a bottleneck.

In any case, write the simplest, clearest code that does the job. If it turns out performance isn't enough, measure where the bottlenecks are, and tackle them one by one until performance is OK. Programmer time is much more valuable than computer time, except for very special circumstances.