JNI - Passing large amounts of data between Java and Native code

Question 1

What would be the most efficient way to pass the bytes from Java to my native code? I have access to it as a byte array. I don't see any particular advantage to passing it as a byte buffer (wrapping this byte array) vs a byte array here.

The big advantage of a direct ByteBuffer is that you can call GetDirectByteBufferAddress on the native side and you immediately have a pointer to the buffer contents, without any overhead. If you pass a byte array, you have to use GetByteArrayElements and ReleaseByteArrayElements (they might copy the array) or the critical versions (they pause the GC). So using a direct ByteBuffer can have a positive impact on your code's performance.

As you said, (i) won't work because you don't know how much data the method is going to return. (ii) is too complex because of that custom packaging protocol. I would go for a modified version of (iii): You don't need that object, you can just return an array of ByteBuffers where the first element is the hash and the other elements are the thumbnails. And you can throw away all the memcpys! That's the entire point in a direct ByteBuffer: Avoiding copying.

Code:

void Java_MyClass_createThumbnails(JNIEnv* env, jobject, jobject input, jobjectArray output)
{
    jsize nThumbnails = env->GetArrayLength(output) - 1;
    void* inputPtr = env->GetDirectBufferAddress(input);
    jlong inputLength = env->GetDirectBufferCapacity(input);

    // ...

    void* hash = ...; // a pointer to the hash data
    int hashDataLength = ...;
    void** thumbnails = ...; // an array of pointers, each one points to thumbnail data
    int* thumbnailDataLengths = ...; // an array of ints, each one is the length of the thumbnail data with the same index

    jobject hashBuffer = env->NewDirectByteBuffer(hash, hashDataLength);
    env->SetObjectArrayElement(output, 0, hashBuffer);

    for (int i = 0; i < nThumbnails; i++)
        env->SetObjectArrayElement(output, i + 1, env->NewDirectByteBuffer(thumbnails[i], thumbnailDataLengths[i]));
}

Edit:

I only have a byte array available to me for the input. Wouldn't wrapping the byte array in a byte buffer still incur the same tax? I also so this syntax for arrays: http://developer.android.com/training/articles/perf-jni.html#region_calls. Though a copy is still possible.

GetByteArrayRegion always write to a buffer, therefore creating a copy every time, so I would suggest GetByteArrayElements instead. Copying the array to a direct ByteBuffer on the Java side is also not the best idea because you still have that copy that you could eventually avoid if GetByteArrayElements pins the array.

If I create byte buffers that wrap native data, who is responsible for cleaning it up? I did the memcpy only because I thought Java would have no idea when to free this. This memory could be on the stack, on the heap or from some custom allocator, which seems like it would cause bugs.

If the data is on the stack, then you must copy it into Java array, a direct ByteBuffer that was created in Java code or somewhere on the heap (and a direct ByteBuffer that points to that location). If it's on the heap, then you can safely use that direct ByteBuffer that you created using NewDirectByteBuffer as long as you can ensure that nobody frees the memory. When the heap memory is free'd, you must no longer use the ByteBuffer object. Java does not try to remove the native memory when a direct ByteBuffer that was created using NewDirectByteBuffer is GC'd. You have to take care of that manually, because you also created the buffer manually.

Question 2

Byte array
I had to something similar, I returned a container (Vector or something) of Byte arrays. One of the other programmers implemented this as (and I think this is easier but a bit silly) a call-back. e.g. the JNI code would call a Java method for each response, then the original call (into the JNI code) would return. This does work okay though.