How does Facebook encode emoji in the json Graph API?
-
26-12-2019 - |
質問
Does anyone know how Facebook encodes emoji with high-surrogate pairs in the Graph API?
Low surrogate pairs seem fine. For example, ❤️ (HEAVY BLACK HEART, though it is red in iOS/OSX, link to image if you can't see the emoji) comes through as \u2764\ufe0f
which appears to match the UTF-16 hex codes / "Formal Unicode Notation" shown here at iemoji.com.
And indeed, in Ruby when parsing the JSON output from the API:
ActiveSupport::JSON.decode('"\u2764\ufe0f"')
you correctly get:
"❤️"
However, to pick another emoji, 💤 (SLEEPING SYMBOL, link to image here. Facebook returns \udbba\udf59
. This seems to correspond with nothing I can find on any unicode resources, e.g., for example this one at iemoji.com.
And when I attempt to decode in Ruby using the same method above:
ActiveSupport::JSON.decode('"\udbba\udf59"')
I get:
""
Any idea what's going on here?
解決
Answering my own question though most of the credit belongs to @bobince for showing me the way in the comments above.
The answer is that Facebook encodes emoji using the "Google" encoding as seen on this Unicode table.
I have created a ruby gem called emojivert that can convert from one encoding to another, including from "Google" to "Unified". It is based on another existing project called rails-emoji.
So the failing example above would be fixed by doing:
string = ActiveSupport::JSON.decode('"\udbba\udf59"')
> ""
fixed = Emojivert.google_to_unified(string)
> "💤"