Question

In Objective-c...

If I have a character like "∆" how can I get the unicode value and then determine if it is in a certain range of values.

For example if I want to know if a certain character is in the unicode range of U+1F300 to U+1F6FF

Was it helpful?

Solution

NSString uses UTF-16 to store codepoints internally, so those in the range you're looking for (U+1F300 to U+1F6FF) will be stored as a surrogate pair (four bytes). Despite its name, characterAtIndex: (and unichar) doesn't know about codepoints and will give you the two bytes that it sees at the index you give it (the 55357 you're seeing is the lead surrogate of the codepoint in UTF-16).

To examine the raw codepoints, you'll want to convert the string/characters into UTF-32 (which encodes them directly). To do this, you have a few options:

  1. Get all UTF-16 bytes that make up the codepoint, and use either this algorithm or CFStringGetLongCharacterForSurrogatePair to convert the surrogate pairs to UTF-32.

  2. Use either dataUsingEncoding: or getBytes:maxLength:usedLength:encoding:options:range:remainingRange: to convert the NSString to UTF-32, and interpret the raw bytes as a uint32_t.

  3. Use a library like ICU.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top