Question

I have a Java method that looks for a word inside a phrase ignoring the case sensitivity of the word, and if it finds the word then it removes it from the phrase. The word and the phrase can be anything. They're variant. Here is my code :

private String removeWord( String phrase, String word ) {
    phrase = phrase.replaceAll( "(?i)" + word , "" );
    return phrase;
}

Things work perfect, unless the word has an accent. For example if the word is "álvarez" and the phrase is "Álvarez phrase", then it won't work as "(?i)" fails to work in that case.

Is there a way to make "(?i)" work with accented characters ?

Was it helpful?

Solution

Just replace (?i) with (?iu) - it will turn on unicode case-insensitive matching

OTHER TIPS

By default (?i) works only with ASCII characters - see Pattern.CASE_INSENSITIVE for details. You could combine that flag with UNICODE_CASE together like so:

phrase = Pattern.compile(word, UNICODE_CASE | CASE_INSENSITIVE).matcher(phrase).replaceAll("");
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top