My JSON feed is here:
http://america.aljazeera.com/bin/ajam/api/story.json?path=/content/ajam/watch/shows/america-tonight/articles/2014/4/28/the-dark-side-oftheoilboomhumantraffickingintheheartland
It is a JSON representation of this HTML page, you can see the same En Dash character in the subtitle of the page.
http://america.aljazeera.com/watch/shows/america-tonight/articles/2014/4/28/the-dark-side-oftheoilboomhumantraffickingintheheartland.html
The En Dash is in the 2nd key (description):
description: "In a North Dakota town that was once dying, oil and money are flowing – and bringing big-city problems",
after the word "flowing".
The page has the following HTTP header:
Content-Type: application/json;charset=UTF-8
which can be seen by requesting it via curl -v
or curl -I
Downloading it in Ruby using HTTParty like so:
> r = HTTParty.get('http://america.aljazeera.com/bin/ajam/api/story.json?path=/content/ajam/watch/shows/america-tonight/articles/2014/4/28/the-dark-side-oftheoilboomhumantraffickingintheheartland')
> r['description']
=> "In a North Dakota town that was once dying, oil and money are flowing –\u0080\u0093 and bringing big-city problems"
mangles it, as seen above. After much research I realized is a representation of the hex utf-8 unicode value as seen here:
http://www.fileformat.info/info/unicode/char/2013/index.htm
specifically, this:
UTF-8 (hex) 0xE2 0x80 0x93 (e28093)
This data is later fed into an iPhone app and an Android app. On the the Android app it looks like the attached . On an iPhone it looks fine - I think because only the first character is rendered and that is a regular Ascii dash, and the next two characters are skipped.
Finally, downloading it in JavaScript using AJAX does seem to handle it correctly:
> r = json['description'].match(/flowing (.*) and/)[1]
> "–"
> r
> "–"
> r.length
> 3
> r.toString(16)
> "–"
So...what is going on? What can I do to fix it? Is the fault with the server or with my code?