You can use java.text.Normalizer
which comes close to normalizing Strings in Java. Though regex
are also a powerful way to manipulate the Strings in whichever way possible.
Example of accent removal:
String accented = "árvíztűrő tükörfúrógép";
String normalized = Normalizer.normalize(accented, Normalizer.Form.NFD);
normalized = normalized.replaceAll("[^\\p{ASCII}]", "");
System.out.println(normalized);
Output:
arvizturo tukorfurogep
More explanation here: http://docs.oracle.com/javase/tutorial/i18n/text/normalizerapi.html