Question

I have the following value in a string variable in Java which has UTF-8 characters encoded like below

Dodd\u2013Frank

instead of

Dodd–Frank

(Assume that I don't have control over how this value is assigned to this string variable)

Now how do I convert (encode) it properly and store it back in a String variable?

I found the following code

Charset.forName("UTF-8").encode(str);

But this returns a ByteBuffer, but I want a String back.

Edit:

Some more additional information.

When I use System.out.println(str); I get

Dodd\u2013Frank

I am not sure what is the correct terminology (UTF-8 or unicode). Pardon me for that.

Was it helpful?

Solution

try

str = org.apache.commons.lang3.StringEscapeUtils.unescapeJava(str);

from Apache Commons Lang

OTHER TIPS

java.util.Properties

You can take advantage of the fact that java.util.Properties supports strings with '\uXXXX' escape sequences and do something like this:

Properties p = new Properties();
p.load(new StringReader("key="+yourInputString));
System.out.println("Escaped value: " + p.getProperty("key"));

Inelegant, but functional.

To handle the possible IOExeception, you may want a try-catch.

Properties p = new Properties();
try { p.load( new StringReader( "key=" + input ) ) ; } catch ( IOException e ) { e.printStackTrace(); }
System.out.println( "Escaped value: " + p.getProperty( "key" ) );

Suppose you have a Unicode value, such as 00B0 (degree symbol, or superscript 'o', as in Spanish abbreviation for 'primero')

Here is a function that does just what you want:

public static String  unicodeToString( char  charValue )
{
    Character   ch = new Character( charValue );

    return ch.toString();
}

I used StringEscapeUtils.unescapeXml to unescape the string loaded from an API that gives XML result.

UnicodeUnescaper from org.apache.commons:commons-text is also acceptable.

new UnicodeUnescaper().translate("Dodd\u2013Frank")

try

str = org.apache.commons.text.StringEscapeUtils.unescapeJava(str);

as org.apache.commons.lang3.StringEscapeUtils is deprecated.

Perhaps the following solution which decodes the string correctly without any additional dependencies.

This works in a scala repl, though should work just as good in Java only solution.

import java.nio.charset.StandardCharsets
import java.nio.charset.Charset

> StandardCharsets.UTF_8.decode(Charset.forName("UTF-8").encode("Dodd\u2013Frank"))
res: java.nio.CharBuffer = Dodd–Frank

You can convert that byte buffer to String like this :

import java.nio.charset.Charset;
import java.nio.charset.CharsetDecoder;
import java.nio.ByteBuffer

public static CharsetDecoder decoder = CharsetDecoder.newDecoder();

public static String byteBufferToString(ByteBuffer buffer)
{
    String data = "";
    try 
    {
        // EDITOR'S NOTE -- There is no 'position' method for ByteBuffer.
        //                   As such, this is pseudocode.
        int old_position = buffer.position();
        data = decoder.decode(buffer).toString();
        // reset buffer's position to its original so it is not altered:
        buffer.position(old_position);  
    }
    catch (Exception e)
    {
        e.printStackTrace();
        return "";
    }
    return data;
 }
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top