I've a UTF-8(in literal) like this "\xE2\x80\x93."

I'm trying to convert this into Unicode using Java.

But I was not able to find a way to convert this.

Can anyone help me on this?

Regards, Sat

有帮助吗?

解决方案

System.out.println(new String(new byte[] {
    (byte)0xE2, (byte)0x80, (byte)0x93 }, "UTF-8"));

prints an em-dash, which is what those three bytes encode. It is not clear from your question whether you have such three bytes, or literally the string you have posted. If you have the string, then simply parse it into bytes beforehand, for example with the following:

final String[] bstrs = "\\xE2\\x80\\x93".split("\\\\x");
final byte[] bytes = new byte[bstrs.length-1];
for (int i = 1; i < bstrs.length; i++)
  bytes[i] = (byte) ((Integer.parseInt(bstrs[i], 16) << 24) >> 24);
System.out.println(new String(bytes, "UTF-8"));

其他提示

You can use the Apache Commons Lang StringEscapeUtils

Or if you do know that the string will always be \xHH\xHH then you can:

String hex = input.replace("\x", "");
byte[] bytes = hexStringToByteArray(hex);
String result = new String(bytes, "utf-8");

hexStringToByteArray is here.

Also see this other SO answer.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top