Question

We have a native c/asm application utilizing GPU(OpenCL) for a big encrypt/decrypt data with a specific method, and it just works perfect, no problem. A part of the project (web and distribution) is been developing by JEE, and we just need to call native application/library.

We have tried to call it as a separated-external process using Process class. The problem is that we cannot control the application(event, handlers, threads, etc...). We also tried to just switch the C code into Java code, but the performance died. Except running the native code as process, I'm thinking about JNA and JNI, but there are some questions.

Questions:

  1. For better(faster) read/write solution, is it possible to exchange data by direct(unmanaged) memory [Java(ByteBuffer#allocateDirect())] in both JNI and JNA?
  2. Is it possible to manage and handle process by native code, and access the GPU(shared) memory through Java code(OpenCL lib)?
  3. What about performance? Is JNA faster than JNI?

We have two AMD W7000 clustered device on Redhat Linux6 x64.

Was it helpful?

Solution

JNA is much slower than JNI, but much easier. If performance is not an issue use JNA.

Using direct buffers have the advantage that the most critical operations don't use JNI or JNA and are thus faster. They use intrinsic when means they get turned into single machine code instructions.

If Java code is significantly slower than C it is likely the code hasn't been optimised enough. Generally the GPU should be doing all the work so if Java is a bit slower this shouldn't make much difference.

e.g. if you spend 99% of the time in the GPU and Java takes twice as long the total will be 99+2% or 1% slower.

OTHER TIPS

I developed a simple dll and put an empty function which does nothing. Then I called that function from dll with JNA and also JNI, so I tried to calculate cost of calling them. When looking performance after many calls, JNI was 30-40 times faster than JNA.

From JNA's official FAQ:

How does JNA performance compare to custom JNI?

JNA direct mapping can provide performance near that of custom JNI. Nearly all the type mapping features of interface mapping are available, although automatic type conversion will likely incur some overhead.

The calling overhead for a single native call using JNA interface mapping can be an order of magnitude (~10X) greater time than equivalent custom JNI (whether it actually does in the context of your application is a different question). In raw terms, the calling overhead is on the order of hundreds of microseconds instead of tens of microseconds. Note that that's the call overhead, not the total call time. This magnitude is typical of the difference between systems using dynamically-maintained type information and systems where type information is statically compiled. JNI hard-codes type information in the method invocation, where JNA interface mapping dynamically determines type information at runtime.

You might expect a speedup of about an order of magnitude moving to JNA direct mapping, and a factor of two or three moving from there to custom JNI. The actual difference will vary depending on usage and function signatures. As with any optimization process, you should determine first where you need a speed increase, and then see how much difference there is by performing targeted optimizations. The ease of programming everything in Java usually outweighs small performance gains when using custom JNI.

The heavy number crunching is done in C/GPU, all your Java <--> C interface does is shuffle data in/out. I'd be suprised if this is a bottleneck.

In any case, write the simplest, clearest code that does the job. If it turns out performance isn't enough, measure where the bottlenecks are, and tackle them one by one until performance is OK. Programmer time is much more valuable than computer time, except for very special circumstances.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top