Domanda

Possible Duplicate:
Java: length of string when using unicode overline to display square roots?

How do I get number of Unicode characters in a String?

Given a char[] of Thai characters:

[อ, ภ, ิ, ช, า, ต, ิ]

This comes out in String as: อภิชาติ

String.length() returns 7. I understand there are (technically) 7 characters, but I need a method that would return me 5. That is the exact number of character spaces represented on screen.

È stato utile?

Soluzione

Seems you just want to not count the unicode marks as separate characters;

static boolean isMark(char ch)
{
    int type = Character.getType(ch);
    return type == Character.NON_SPACING_MARK ||
           type == Character.ENCLOSING_MARK ||
           type == Character.COMBINING_SPACING_MARK;
}

which can be used as;

String olle = "อภิชาติ";
int count = 0;

for(int i=0; i<olle.length(); i++)
{
    if(!isMark(olle.charAt(i)))
        count++;
}

System.out.println(count);

and returns '5'.

Altri suggerimenti

You can adapt the solution posted to this question here:

Unicode to string conversion in Java

By stripping the '#' character and counting the remaining characters in the string.

You can use a java.text.BreakIterator to find the gaps between the graphemes ("visual characters") and count them. Here's an example:

import java.text.BreakIterator;

..

int graphemeLength(String str) {
    BreakIterator iter = BreakIterator.getCharacterInstance();
    iter.setText(str);

    int count = 0;
    while (iter.next() != BreakIterator.DONE) count++;

    return count;
}

Now graphemeLength("อภิชาติ") will return 5.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top