How to convert java strings to wide character strings using JNI

https://stackoverflow.com/questions/14320235

15-01-2022
|

Вопрос

Several months ago, I wrote a Java API that use JNI to wrap around a C API. The C API used char strings and I used GetStringUTFChars to create the C strings from the Java Strings.

I neglected to think through the problems that might arise with non-ASCII characters.

Since then the creator of the C API has created wide character equivalents to each of his C functions that require or return wchar_t strings. I would like to update my Java API to use these wide character functions and overcome the issue I have with non-ASCII characters.

Having studied the JNI documentation, I am a little confused by the relative merits of using the GetStringChars or GetStringRegion methods.

I am aware that the size of a wchar_t character varies between Windows and Linux and am not sure of the most efficient way to create the C strings (and convert them back to Java strings afterwards).

This is the code I have at the moment which I think creates a string with two bytes per character:

int len;
jchar *Src;

len = (*env)->GetStringLength(env, jSrc);
printf("Length of jSrc is %d\n", len);

Src = (jchar *)malloc((len + 1)*sizeof(jchar));
(*env)->GetStringRegion(env, jSrc, 0, len, Src);
Src[len] = '\0';

However, this will need modifying when the size of a wchar_t differs from jchar.

Решение

Isn't the C API creator willing to take step back and reimplement with UTF-8? :) Your work would essentialy disappear, needing only GetStringUTFChars/NewStringUTF.

jchar is typedefed to unsigned short and is equivalent to JVM char which is UTF-16. So on Windows where wchar_t is 2 bytes UTF-16 too, you can do away with the code you presented. Just copy the raw bytes around, allocate accordingly. Don't forget to free after you're finished with the C API call. Complement with NewString for conversion back to jstring.

The only other wchar_t size i am aware of is 4 bytes (most prominently Linux) being UTF-32. And here comes the problem: UTF-32 is not just UTF-16 somehow padded to 4 bytes. Allocating double the amount of memory is just a beginning. There is a substantial conversion to do, like this one which seems to be sufficiently free.

But if you are not after performance that much and are willing to give up the plain memory copying on Windows, i suggest going jstring to UTF-8 (which is what JNI provides natively with documented functionality) and then UTF-8 to UTF-16 or UTF-32 depending on sizeof(wchar_t). There won't be any assumptions about what byte order and UTF encoding each platform gives. You seem to care about it, i see that you are checking sizeof(jchar) which is 2 for the most of the visible universe :)

Лицензировано под: CC-BY-SA с атрибуция

Не связан с StackOverflow