質問

Does anyone know how Facebook encodes emoji with high-surrogate pairs in the Graph API?

Low surrogate pairs seem fine. For example, ❤️ (HEAVY BLACK HEART, though it is red in iOS/OSX, link to image if you can't see the emoji) comes through as \u2764\ufe0f which appears to match the UTF-16 hex codes / "Formal Unicode Notation" shown here at iemoji.com.

And indeed, in Ruby when parsing the JSON output from the API:

ActiveSupport::JSON.decode('"\u2764\ufe0f"')

you correctly get:

"❤️"

However, to pick another emoji, 💤 (SLEEPING SYMBOL, link to image here. Facebook returns \udbba\udf59. This seems to correspond with nothing I can find on any unicode resources, e.g., for example this one at iemoji.com.

And when I attempt to decode in Ruby using the same method above:

ActiveSupport::JSON.decode('"\udbba\udf59"')

I get:

"󾭙"

Any idea what's going on here?

役に立ちましたか?

解決

Answering my own question though most of the credit belongs to @bobince for showing me the way in the comments above.

The answer is that Facebook encodes emoji using the "Google" encoding as seen on this Unicode table.

I have created a ruby gem called emojivert that can convert from one encoding to another, including from "Google" to "Unified". It is based on another existing project called rails-emoji.

So the failing example above would be fixed by doing:

string = ActiveSupport::JSON.decode('"\udbba\udf59"')
> "󾭙"
fixed = Emojivert.google_to_unified(string)
> "💤"
ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top