Question

Not sure if this has been answered already but a quick search didn't turn up a satisfying result..
I'm stuck with the following scenario:

  • web service with REST API and JSON formatted data blobs
  • android client app talking to this service and locally caching / processing the data

The we service is run by a German company so some of the strings in the result data contain special characters like German umlauts:

// example resonse
[
    {
         "title" : "reward 1",
         "description" : "Ein gro\u00dfer Kaffee f\u00fcr dich!"
    },
    {
         "title" : "reward 2",
         "description" : "Eine Pizza f\u00fcr dich!"
    },
    ...
]

Locally the app is parsing the data using a set of classes which mirror the response objects (e.g. Reward and RewardResponse classes for the upper example). Each of these classes can read and dump itself from / to JSON - however this is where things get ugly.

Taking the example above org.json will correctly parse the data and the resulting strings will contain proper Unicode versions of the special characters 'ß' (\u00df) and 'ü' (\u00fc).

final RewardResponse response = new RewardResponse(jsonData);
final Reward reward = response.get(0);

// this will print "Ein großer Kaffee für dich!"
Log.d("dump server data", reward.getDescription());

final Reward reward2 = new Reward(reward.toJSON());

// this will print "Ein gro�er Kaffee f�r dich!"
Log.d("dump reloaded data", reward2.getDescription());

As you can see there is a problem with loading the data generated by JSONObject.toString().
Mainly whats happening is that JSONObject will parse escapes in the form of "\uXXXX" but it will dump them as plain UTF-8 text.

In turn, when parsing it won't properly read the unicode and instead insert a replacement character in the result string (� above \uffff as code point).

My current workaround consists of a look-up table containing the Unicode Latin1 supplement characters and their respective escaped versions (\u00a0 up to \u00ff). But this also means I have to go over each and every dumped JSON text and replace the characters with their escaped versions each time I dump something.

Please tell me there is a better way for this!

(Note: there is this question however he had problems with local file encoding on disk.
My problem above, as you can see, is reproducible without ever writing to disk)

EDIT: As requested in the comments here's the toJSON() method:

public final String toJSON() {
    JSONObject obj = new JSONObject();

    // mTitle and mDescription contain the unmodified
    // strings received from parsing.
    obj.put("title", mTitle);
    obj.put("description", mDescription);

    return obj.toString();
}

As a side note it makes no difference if I use JSONObject.toString() or a JSONStringer. (The documentation advises to use .toString())

EDIT: just to remove Reward from the equation, this reproduces the problem:

final JSONObject inputData = new JSONObject("{\"description\":\"Ein gro\\u00dfer Kaffee\"}");
final JSONObject parsedData = new JSONObject(inputData.toString());

Log.d("inputData", inputData.getString("description"));
Log.d("parsedData", parsedData.getString("description"));
Was it helpful?

Solution

[Note: posted as an answer for better formatting]

I just tried the example

final JSONObject inputData = new JSONObject("{\"description\":\"Ein gro\\u00dfer Kaffee\"}");
final JSONObject parsedData = new JSONObject(inputData.toString());

Log.d("inputData", inputData.getString("description"));
Log.d("parsedData", parsedData.getString("description"));

on my Nexus 7 running Android 4.2.1, and on Nexus S running 4.1.2, and it works as intended:

D/inputData(17281): Ein großer Kaffee
D/parsedData(17281): Ein großer Kaffee

In which Android version did you see the problem?

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top