JNA is much slower than JNI, but much easier. If performance is not an issue use JNA.
Using direct buffers have the advantage that the most critical operations don't use JNI or JNA and are thus faster. They use intrinsic when means they get turned into single machine code instructions.
If Java code is significantly slower than C it is likely the code hasn't been optimised enough. Generally the GPU should be doing all the work so if Java is a bit slower this shouldn't make much difference.
e.g. if you spend 99% of the time in the GPU and Java takes twice as long the total will be 99+2% or 1% slower.