Question

As per answer from How to determine whether a character is a letter in Java? i was using below code snippet to match if string contains unicode letter or not in the begining. It was working great on java 6 where unicode character \u0374 was not treated as unicode lettter

 boolean test = "\u0374100".matches("[\\p{L}].*");; returns true on java 7 whereas it return false java 6.

Has there been any change in java 7 regarding this perspective ? If yes how to make java 6 things working on java 7 ?

Était-ce utile?

La solution

According to Fileformat.Info: Unicode Character 'GREEK NUMERAL SIGN' (U+0374), the category is "Letter, Modifier [Lm]". It also says that the result for Character.isLetter() is Yes.

Now contrast this with Unicode Character 'GREEK LOWER NUMERAL SIGN' (U+0375) which has category "Symbol, Modifier [Sk]". According to the page the result for Character.isLetter() is No.

Java 7 uses Unicode 6.0.0 according to the Character javadoc and Internationalization Enhancements in Java SE 7, while Java 6 uses Unicode 4.0 (see the Character javadoc and Java Language Specification 5.0 (which applies to both Java 5 and 6)).

The reason is that unicode now defines U+0374 as a "Letter, Modifier". Looking at the unicode database for Unicode 4.0.0 and Unicode 6.0.0 it is clear the definition changed from Sk to Lm:

Version 4.0:

0374;GREEK NUMERAL SIGN;Sk;0;ON;02B9;;;;N;GREEK UPPER NUMERAL SIGN;Dexia keraia;;;

Version 6.0.0:

0374;GREEK NUMERAL SIGN;Lm;0;ON;02B9;;;;N;GREEK UPPER NUMERAL SIGN;;;;

In other words: Your regex is working correctly, it is just that the character definition has changed so it is now considered to be a letter, not a symbol.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top