Facebook Graph API - non English album names

https://stackoverflow.com/questions/3806215

25-09-2019
|

Question

I am trying to do a simple thing - get all my albums. the problem is that the album names are non-English ( they are in Hebrew ).

The code that retrieves the albums :

string query = "https://graph.facebook.com/me/albums?access_token=...";
string result = webClient.DownloadString(query);

And this is how one of the returned albums looks like :

{
     "id": "410329886431",
     "from": {
        "name": "Noam Levinson",
        "id": "500786431"
     },
     "name": "\u05ea\u05e2\u05e8\u05d5\u05db\u05ea \u05d2\u05de\u05e8 \u05e9\u05e0\u05d4 \u05d0",
     "location": "\u05e9\u05e0\u05e7\u05e8",
     "link": "http://www.facebook.com/album.php?aid=193564&id=500786431",
     "count": 27,
     "type": "normal",
     "created_time": "2010-07-18T06:20:27+0000",
     "updated_time": "2010-07-18T09:29:34+0000"
  },

As you can see the problem is in the "name" property. Instead of Hebrew letters I get those codes (These codes are not garbage, they are consistent - each code probably represents a single Hebrew letter). The question is , how can I convert those codes to a non-English language ( In my case, Hebrew). Or maybe the problem is how I retrive the albums with the webClient object. maybe change webclient.Encoding somehow?

what can I do to solve this problem ?

Thanks in advance.

Solution

That's how Unicode is represented in JSON (see the char definition in the sidebar). They are escape sequences in which the four hex digits are the Unicode code point of the character. Note that since there's only four hex digits available, only Unicode characters from the BMP can be represented in JSON.

Any decent JSON parser will transform these Unicode escape sequences into properly encoded characters for you - provided the target encoding supports the character in the first place.

OTHER TIPS

I had the same problem with Facebook Graph Api and escaped unicode Romanian characters. I have used PHP but, you probably can translate the regexp method into javascript.

Method 1 (PHP):

$str = "\u05ea\u05e2\u05e8\u05d5\u05db\u05ea";
function esc_unicode2html($string) {
    return preg_replace('/\\\\u([0-9a-z]{4})/', '&#x$1;', $string);
}
echo esc_unicode2html($str);

Method 2 (PHP) and probaby it works also if u declare the charset directly in the html:

header('content-type:text/html;charset=utf-8');

These are Unicode character codes. The \u sequence tells the parser that the next 4 characters are actually form a unicode character number. What these characters look like will depend on your font, if someone does not have the correct font they may just appear as a lot of square boxes. That's about as much as I know, Unicode is complicated.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow