Question

We use a web service which expects UTF-8. The framework we use on the client is Apache Axis2. We call the web service and the soap body contains strings in UTF-8. The problem is that it seems like the body is "double encoded". I.e we have the character 'å'. The utf-8 representation of 'å' in utf-8 is C3 A5 however we see in our logs that the (double) encoded value sent is C3 83 C2 A5.

Has anyone experienced similiar problems?

Was it helpful?

Solution

It's not entirely clear how you're calling the web service. Does the method in the web service just take a string? If so, what does your string look like in Java? All strings in Java are UTF-16 encoded - if you're converting the UTF-8 binary representation into a string by taking each byte and turning it into a character, then that's the problem.

If you could show what the method you're calling looks like, and how you're calling it, that would help a lot.

For what it's worth, I've used Axis with non-ASCII strings with no problem in the past. I strongly suspect this is a problem with how you're using it rather than with Axis itself, although I'm willing to be proved wrong :)

EDIT: Based on your comment, it sounds like you've got problems receiving the HTML form data, before you hit the web service. If the user has typed "å" into the form, then that's what you should see when you debug in Eclipse. If you're putting bad data into your web service, it's no wonder you're getting bad data out at the other end. I suggest you run WireShark to see exactly what the browser is sending you, both in terms of the raw bytes and also what content encoding it's specifying. My guess is that your web server is treating it as ISO-8859-1 but it's actually UTF-8.

Once you've got the string correctly from the form, I suspect you'll find there are no problems at all in passing it on to the web service.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top