Question

When I do Collection.sort(List), it will sort based on String's compareTo() logic,where it compares both the strings char by char.

    List<String> file1 = new ArrayList<String>();
    file1.add("1,7,zz");
    file1.add("11,2,xx");
    file1.add("331,5,yy");
    Collections.sort(file1);

My understanding is char means it specifies the unicode value, I want to know the unicode values of char like ,(comma) etc. How can I do it? Any url contains the numeric value of these?

Was it helpful?

Solution

My understanding is char means it specifies the unicode value, I want to know the unicode values of char like ,(comma) etc

Well there's an implicit conversion from char to int, which you can easily print out:

int value = ',';
System.out.println(value); // Prints 44

This is the UTF-16 code unit for the char. (As fge notes, a char in Java is a UTF-16 code unit, not a Unicode character. There are Unicode code points greater than 65535, which are represented as two UTF-16 code units.)

Any url contains the numeric value of these?

Yes - for more information about Unicode, go to the Unicode web site.

OTHER TIPS

Uhm no, char is not a "unicode value" (and the word to use is Unicode code point).

A char is a code unit in the UTF-16 encoding. And it so happens that in Unicode's Basic Multilingual Plane (ie, Unicode code points ranging from U+0000 to U+FFFF, for code points defined in this range), yes, there is a 1-to-1 mapping between char and Unicode.

In order to know the numeric value of a code point you can just do:

System.out.println((int) myString.charAt(0));

But this IS NOT THE CASE for code points outside the BMP. For these, one code point translates to two chars. See Character.toChars(). And more generally, all static methods in Character relating to code points. There are quite a few!

This also means that String's .length() is actually misleading, since it returns the number of chars, not the number of graphemes.

Demonstration with one Unicode emoticon (the first in that page):

System.out.println(new String(Character.toChars(0x1f600)).length())

prints 2. Whereas:

final String s = new String(Character.toChars(0x1f600));
System.out.println(s.codePointCount(0, s.length());

prints 1.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top