toLowerCase() method in Java when used with Locale does not produce the exact result
Question
Look at the following code snippet in Java.
final public class Main
{
public static void main(String[] args)
{
Locale.setDefault(new Locale("lt"));
String str = "\u00cc"; //setting Lithuanian as locale
System.out.println("Before case conversion is "+str+" and length is "+str.length());// Ì
String lowerCaseStr = str.toLowerCase();
System.out.println("Lower case is "+lowerCaseStr+" and length is "+lowerCaseStr.length());// i?`
}
}
It displays the following output.
Before case conversion is Ì and length is 1
Lower case is i̇̀ and length is 3
In the first System.out.println()
statement, the result is exact. In the second statement, however, it displays the length 3 which actually should have been 1. I don't understand, Why?
Solution
Different languages have different rules to transform to upper- or lower-case.
For example, in German, the lowercase ß becomes two uppercase S, so the word "straße" (a street), which is 6 characters long, becomes "STRASSE", which is 7 characters long.
This is why your upper-cased and lower-cased strings have different lengths.
I wrote about this in one of my Java Quiz : http://thecodersbreakfast.net/index.php?post/2010/09/24/Java-Quiz-42-%3A-A-string-too-far
OTHER TIPS
I get a different result:
Before case conversion is Ì and length is 1
Lower case is i?? and length is 3
It is quite duplicate of Does Java's toLowerCase() preserve original string length?. It is very helpful and having answer in very details. the length of str and str.toLowerCase() are not always same because the converstion depend on the code of each char.
In this case the second output is "Lower case is i??
and length is 3". it is trailed by two ? mark so length is 3.