Pergunta

I've a UTF-8(in literal) like this "\xE2\x80\x93."

I'm trying to convert this into Unicode using Java.

But I was not able to find a way to convert this.

Can anyone help me on this?

Regards, Sat

Foi útil?

Solução

System.out.println(new String(new byte[] {
    (byte)0xE2, (byte)0x80, (byte)0x93 }, "UTF-8"));

prints an em-dash, which is what those three bytes encode. It is not clear from your question whether you have such three bytes, or literally the string you have posted. If you have the string, then simply parse it into bytes beforehand, for example with the following:

final String[] bstrs = "\\xE2\\x80\\x93".split("\\\\x");
final byte[] bytes = new byte[bstrs.length-1];
for (int i = 1; i < bstrs.length; i++)
  bytes[i] = (byte) ((Integer.parseInt(bstrs[i], 16) << 24) >> 24);
System.out.println(new String(bytes, "UTF-8"));

Outras dicas

You can use the Apache Commons Lang StringEscapeUtils

Or if you do know that the string will always be \xHH\xHH then you can:

String hex = input.replace("\x", "");
byte[] bytes = hexStringToByteArray(hex);
String result = new String(bytes, "utf-8");

hexStringToByteArray is here.

Also see this other SO answer.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top