Domanda

I have a regex for names that I wish to match diacritics. Here is a logging snippet from my code starting with test.java:191:

Util.Log("text = " + text);
Util.Log("regex = " + regex);
Util.Log("regexorig = " + regexorig);
Util.Log("Matches static: " + Pattern.matches(text,  regex));
Pattern p1 = Pattern.compile(regex);
Util.Log("Matches p1: " + p1.matcher(text).matches());
Pattern p2 = Pattern.compile(regexorig, Pattern.UNICODE_CASE | Pattern.CASE_INSENSITIVE);
Util.Log("Matches p2: " + p2.matcher(text).matches());
Util.Log("String matches: " + text.matches(regex));

Here's the output when I use input "ü":

LOG: (test.java:191):text = ü
LOG: (test.java:192):regex = (?iu)[A-Z][A-Z0-9 \-.',]*
LOG: (test.java:193):regexorig = [A-Z][A-Z0-9 \-.',]*
LOG: (test.java:194):Matches static: false
LOG: (test.java:196):Matches p1: false
LOG: (test.java:198):Matches p2: false
LOG: (test.java:199):String matches: false

I can't seem to get a diacritic-insensitive regex match to work. Is this an Android bug or am I missing something? According to the documentation, UNICODE_CASE is always on for Android case-insensitive strings, so I shouldn't even need it (really not sure why that's the case, but that's a matter for a different discussion).

È stato utile?

Soluzione

A-Z still matches only ASCII letters.

Use the Unicode property \p{L} (any Unicode letter character) instead. This way, you don't even need the i or u modifier. Like:

\p{L}[\p{L}0-9 \-.',]*

There could be one more problem, though. In Unicode, characters with diacritics can also be represented by multiple characters. For instance á could be a single Unicode character (U+00E1), or it could be a a (U+0061) followed by a combining mark ´ (U+0301). \p{L} matches only stand-alone characters but not those combining marks. Hence, to catch these cases as well, you might want to insert the Unicode property of combining marks into the repetition as well:

\p{L}[\p{L}\p{M}0-9 \-.',]*

Working demo.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top