質問

In my assigned project, the original author has written a function:

  public String asString() throws DataException
  {
    if (getData() == null) return null;

    CharBuffer charBuf = null;

    try
    {
        charBuf = s_charset.newDecoder().decode(ByteBuffer.wrap(f_data));
    }
    catch (CharacterCodingException e)
    {
        throw new DataException("You can't have a string from this ParasolBlob: " + this, e);
    }

    return charBuf.toString()+"你好";
 }   

Please note that the constant s_charset is defined as:

private static final Charset s_charset = Charset.forName("UTF-8");

Please also note that I have hard-coded a Chinese string in the return string.

Now when the program flow reaches this method, it will throw the following exception:

 java.nio.charset.UnmappableCharacterException: Input length = 2

And more interstingly, the hard-coded Chinese strings will be shown as "??" at the console if I do a System.out.println().

I think this problem is quite interesting in regard of Localization. And I've tried changing it to Charset.forName("GBK");

but seems is not the solution. Also, I have set the coding of the Java class to be of "UTF-8".

Any experts have experience in this regard? Would you please share a little? Thanks in advance!

役に立ちましたか?

解決

And more interstingly, the hard-coded Chinese strings will be shown as "??" at the console if I do a System.out.println().

System.out performs transcoding operations from UTF-16 strings to the default JRE character encoding. If this does not match the encoding used by the device receiving the character data is corrupted. So, the console should be set to use the right character encoding(UTF-8) to render the chinese chars properly.

If you are using eclipse then you can change the console encoding by going to

Run Configuration-> Common -> Encoding(slect UTF-8 from dropdown)

enter image description here

他のヒント

Java Strings are unicodes

System.out.println("你好");

As Kevin stated, depending on what is the underlying encoding of your source file, this encoding will be used to convert it to UTF-16BE (real encoding of Java String). So when you see "??" it is surely simple conversion error.

Now, if you want to convert simple byte array to String, using given character encoding, I believe there is much easier way to do this, than using raw CharsetDecoder. That is:

byte[] bytes = {0x61};
String string = new String(bytes, Charset.forName("UTF-8"));
System.out.println(string);

This will work, provided that the byte array really contains UTF-8 encoded stream of bytes. And it must be without BOM, otherwise the conversion will probably fail. Make sure that what you are trying to convert does not start with the sequence 0xEF 0xBB 0xBF.

ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top