Question

I have an error where I am loading data from a web-service into the datastore. The problem is that the XML returned from the web-service has UTF-8 characters and app engine is not interpreting them correctly. It renders them as ??.

I'm fairly sure I've tracked this down to the URL Fetch request. The basic flow is: Task queue -> fetch the web-service data -> put data into datastore so it definitely has nothing to do with request or response encoding of the main site.

I put log messages before and after Apache Digester to see if that was the cause, but determined it was not. This is what I saw in logs:

string from the XML: "Doppelg��nger"

After digester processed: "Doppelg??nger"

Here is my url fetching code:

public static String getUrl(String pageUrl) {
    StringBuilder data = new StringBuilder();
    log.info("Requesting: " + pageUrl);
    for(int i = 0; i < 5; i++) {
        try {
            URL url = new URL(pageUrl);
            URLConnection connection = url.openConnection();
            connection.connect();
            BufferedReader reader = new BufferedReader(new InputStreamReader(connection.getInputStream()));
            String line;
            while ((line = reader.readLine()) != null) {
                data.append(line);
            }
            reader.close();
            break;
        } catch (Exception e) {
            log.warn("Failed to load page: " + pageUrl, e);
        }
    }
    String resp = data.toString();
    if(resp.isEmpty()) {
        return null;
    }
    return resp;

Is there a way I can force this to recognize the input as UTF-8. I tested the page I am loading and the W3c validator recognized it as valid utf-8.

The issue is only on app engine servers, it works fine in the development server.

Thanks

Était-ce utile?

La solution

try

BufferedReader reader = new BufferedReader(new InputStreamReader(connection.getInputStream(), "UTF-8"));

Autres conseils

I was drawn into the same issue 3 months back Mike. It does look like and I would assume your problems are same. Let me recollect and put it down here. Feel free to add if I miss something.

My set up was Tomcat and struts. And the way I resolved it was through correct configs in Tomcat. Basically it has to support the UTF-8 character there itself. useBodyEncodingForURI in the connector. this is for GET params

Plus you can use a filter for POST params. A good resource where yu can find all this in one roof is Click here!

I had a problem in the production thereafter where I had apache webserver redirecting request to tomcat :). Similarly have to enable UTF-8 there too. The moral of the story resolve the problem as it comes :)

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top