Question

Here is the story. I'm using a C lib for data transmission, and all work well except for certain Emoji. For the receiver, the received message is char *message, for which can use "printf" to print out the right thing. but when using stringWithCString:encoding:(or stringWithUTF8String: etc.), it return nil for certain emoji.

So, I print out each char, and here the successful emoji:

"\xe2\x9a\xbd"

"\xe2\x9a\xbe\xef\xb8\x8f"

"\xe2\x98\x81\xef\xb8\x8f"

But the failed emoji:

"\xed\xa0\xbc\xed\xbe\x82"

"\xed\xa0\xbc\xed\xbf\x80"

"\xed\xa0\xbc\xed\xbc\x88"

have beening stuck on this days.

any ideas?

Thanks so much!!

Was it helpful?

Solution

It's because your bytes are not a valid UTF-8 encoded string. According to UTF-8, "\xed\xa0\xbc" encodes the Unicode code point U+D83C. This is a code point in the high surrogate block. Surrogates are not valid characters and are reserved for UTF-16 purposes. It is not valid for UTF-8 to encode such codepoints.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top