Question

I'm looking into using ICU for Unicode string processing in a native Node.js module because it seems to me that v8::String (according to these docs) doesn't have a C++ API for this purpose.

To my knowledge V8 expects UTF-16 in ExternalStringResource and other APIs, so I'd like to use ICU for UTF-16 processing.
I specifically need to:

  • Iterate over the characters (not just the 16-bit code units) of an UTF-16 string
  • Tell the number of characters (not just the 16-bit code units) that an UTF-16 string contains

So I looked at the ICU documentation and found the UnicodeString and CharacterIterator classes. However, UnicodeString doesn't have a fromUTF16 method, only fromUTF8 and fromUTF32.

The other thing I'm unsure about is, does the UnicodeString constructor copy the data I give it or not? I'd very much prefer to use a zero-copy approach where I'd just work with an immutable object so it shouldn't perform any copy operations, just use the buffer I point it at.

I'm also unsure if I can just use UCharIterator (assuming I can somehow convert UChar* from my UTF-16 strings).

So my question is: How do I use ICU for the above purposes?

Thanks in advance for your answers!

Was it helpful?

Solution

UnicodeString uses UTF-16 for storage by default. That's why it only has fromUTF8 and fromUTF32: from UTF-16 there is no conversion to be made.

It does copy the data. It is an owning string, much like std::string.

You can use UCharIterator if you don't want to copy the data. UChar is a 16-bit value. You can force it to be whatever 16-bit type you prefer working with by defining the UCHAR_TYPE macro:

Define UChar to be UCHAR_TYPE, if that is #defined (for example, to char16_t), or wchar_t if that is 16 bits wide; always assumed to be unsigned.

If neither is available, then define UChar to be uint16_t.

This makes the definition of UChar platform-dependent but allows direct string type compatibility with platforms with 16-bit wchar_t types.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top