Question

I just get a strange encoding problem in java web project.

System.out.println("search url: " + searchURL);    
searchURL = new String(searchURL.getBytes("utf-8"), "utf-8");
System.out.println("test===" + new String(searchURL.getBytes("utf-8")));

I test the code above in java main function, and in chinese character it works all right.

output:
search url: https://api.datamarket.azure.com/Data.ashx/Bing/Search/Image?Query=%27机器 猫%27&$format=json&$skip=0

test===https://api.datamarket.azure.com/Data.ashx/Bing/Search/Image?Query=%27机器 猫%27&$format=json&$skip=0

But when runs this code in tomcat.

output:
search url: https://api.datamarket.azure.com/Data.ashx/Bing/Search/Image?Query=%27机器 猫%27&$format=json&$skip=0

test===https://api.datamarket.azure.com/Data.ashx/Bing/Search/Image?Query=%27鏈哄櫒 鐚?27&$format=json&$skip=0

then i test this in tomcat:

searchURL = new String(searchURL.getBytes("utf-8"), "utf-8");
System.out.println(new String(searchURL.getBytes("gbk"));
System.out.println(new String(searchURL.getBytes("gb2312"));

both above is ok. so why ? Any suggestion will be appreciated, really thx !

Was it helpful?

Solution

the default charset will be different between your jvm and the tomcat jvm

try

System.out.println(Charset.defaultCharset());

this will use the default charset to encode the string which may or may not be utf-8

System.out.println("test===" + new String(searchURL.getBytes("utf-8")));

so while the byte array is utf-8 the decoder may expect something else.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top