Question

I'm making a small project in Google AppEngine but I'm having problems with international chars. My program takes data from the user through the url "page.html?data1&data2..." and stores it for displaying later.

But when the user are using some international characters like åäö it gets coded as %F4, %F5 and %F6. I assume it is because only the first 128(?) chars in ASCII table are allowed in http-requests.

Is there anyone who has a good solution for this? Any simple way to decode the text? And is it better to decode it before I store the data or should I decode it when displaying it to the user.

Was it helpful?

Solution

URLs can contain anything, but it should be encoded. In Java you can use URLEncoder and URLDecoder to encode and decode urls with the desired character encoding.

Have in mind that these classes are actually meant for HTML form encoding, but they can be applied to the query string (the parameters) of the URLs, so do not use them on the whole URLs - only on the parameters.

OTHER TIPS

The URI spec (RFC 3986) restricts the characters that can be used in URIs (see the ABNF) and defines a percent-encoding scheme for transmitting "unsafe" characters. As Bozho says, the query part of the URL is usually encoded as per the HTML spec (application/x-www-form-urlencoded).

The doc for App Engine says:

App Engine uses the Java Servlet standard for web applications.

So, you should probably let the Servlet API decode the parameters for you. See the parameter methods on HttpServletRequest. This sort of encoding should generally be kept to the view layer, so data would be stored unencoded.

If you do things manually, have a look at this blog post on character handling in URIs.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top