Unicode character sent to server is returned as garbage

https://stackoverflow.com/questions/19673050

01-07-2022
|

Question

Update: After further investigation I've managed to narrow the problem down to the json encoder. Passing the input straight through works fine, but passing it through MultiJson.encode is what's causing the problem.

I'm sending the following up to a restful web service via curl:

$ curl -v -X POST "http://my/url" -d "{\"body\": \"💳\"}"

The character that you probably can't see is the Credit Card emoji character, which is U+1F4B3.

The response I get back from the service is essentially:

< HTTP/1.1 200 OK
< Date: Wed, 30 Oct 2013 02:38:04 GMT
< Content-Type: application/json;charset=utf-8
< Content-Length: 266
< Connection: close
< 
{ [data not shown]
100   304  100   266  100    38    936    133 --:--:-- --:--:-- --:--:--   936
* Closing connection 0
{
  "body": "\uf4b3"
}

This encoded character does not correspond to what I sent and I would expect it to be returned as sent (in this case).

I have access to the server's source code. It's built on Ruby, Sinatra and ActiveRecord. There is some amount of processing going on before the response is sent:

First the content is passed through ERB::Util.html_escape
Then, a series of regexs are applied via str.gsub!(reg, " ### ")
Finally, the response is returned via MultiJson.encode

I'm not a Ruby person, but can provide additional details if necessary. Would appreciate someone pointing me in the right direction. Thanks!

Solution 2

We were able to solve this problem by migrating to a different JSON encoding engine:

get "/foo" do
    resp = "💳"

    puts MultiJson.adapter()
    puts MultiJson.dump(resp) # Fails

    MultiJson.engine = :jrjackson
    puts MultiJson.adapter()
    puts MultiJson.dump(resp) # Succeeds
end

OTHER TIPS

The first thing to check is if the character is getting "into" the body of your application the way you think it does. Ruby has a notion of default "internal" and "external" encoding. And once a string gets in via various IO, for various reasons it may or may not have the expected encoding as it's passed around.

Which isn't to say that it's hard to manage or confusing — it's all pretty straightforward, but I'm just pointing out that all these things can possibly be configured/changed.

To see what you are starting with, as soon as you can in your program, once you have the input, check its encoding.

params[:foo].encoding
=> #<Encoding:UTF-8>

If it's not utf-8 then you need to set your environment and/or your IO mechanism to use utf-8.

Starting in ruby 2.0, the default encoding is — praise the gods — utf8. So if you aren't using ruby 2.0 and are able to, start with upgrading to that.

If you don't have that option, then you need to set the default encoding. Although it seems sinatra sets it to utf-8.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow