Question

I have the following string:

String str = "Klaße, STRAßE, FUß";

Using of combined regex I want to replace German ß letter to ss or SS respectively. To perform this I have:

String replaceUml = str
        .replaceAll("ß", "ss")
        .replaceAll("A-Z|ss$", "SS")
        .replaceAll("^(?=^A-Z)(?=.*A-Z$)(?=.*ss).*$", "SS");

Expected result:

Klasse, STRASSE, FUSS

Actual result:

Klasse, STRAssE, FUSS

Where I'm wrong?

Was it helpful?

Solution 2

String replaceUml = str
    .replaceAll("(?<=\\p{Lu})ß", "SS")
    .replace("ß", "ss")

This uses regex with a preceding unicode upper case letter ("SÜß"), to have capital "SS".

The (?<= ... ) is a look-behind, a kind of context matching. You could also do

    .replaceAll("(\\p{Lu})ß", "$1SS")

as ß will not occure at the beginning.

Your main trouble was not using brackets [A-Z].

OTHER TIPS

First of all, if you're trying to match some character in the range A-Z, you need to put it in square brackets. This

.replaceAll("A-Z|ss$", "SS")

will look for the three characters A-Z in the source, which isn't what you want. Second, I think you're confused about what | means. If you say this:

.replaceAll("[A-Z]|ss$", "SS")

it will replace any upper-case letter at the end of the word with SS, because | means look for this or that.

A third problem with your approach is that the second and third replaceAll's will look for any ss that was in the original string, even if it didn't come from a ß. This may or may not be what you want.

Here's what I'd do:

String replaceUml = str
    .replaceAll("(?<=[A-Z])ß", "SS")
    .replaceAll("ß", "ss");

This will first replace all ß by SS if the character before the ß is an upper-case letter; then if there are any ß's left over, they get replaced by ss. Actually, this won't work if the character before ß is an umlaut like Ä, so you probably should change this to

String replaceUml = str
    .replaceAll("(?<=[A-ZÄÖÜ])ß", "SS")
    .replaceAll("ß", "ss");

(There may be a better way to specify an "upper-case Unicode letter"; I'll look for it.)

EDIT:

String replaceUml = str
    .replaceAll("(?<=\\p{Lu})ß", "SS")
    .replaceAll("ß", "ss");

A problem is that it won't work if ß is the second character in the text, and the first letter of the word is upper-cased but the rest of the word isn't. In that case you probably want lower-case "ss".

String replaceUml = str
    .replaceAll("(?<=\\b\\p{Lu})ß(?=\\P{Lu})", "ss")
    .replaceAll("(?<=\\p{Lu})ß", "SS")
    .replaceAll("ß", "ss");

Now the first one will replace ß by ss if it's preceded by an upper-case letter that is the first letter of the word but followed by a character that isn't an upper-case letter. \P{Lu} with an upper-case P will match any character other than an upper-case letter (it's the negative of \p{Lu} with a lower-case p). I also included \b to test for the first character of a word.

Breaking your regex into parts:

Regex 101 Demo

Regex

/ß/g

Description

ß Literal ß
g modifier: global. All matches (don't return on first match)

Visualization

Regular expression visualization


Regex 101 Demo

Regex

/([A-Z])ss$/g

Description

1st Capturing group ([A-Z]) 
    Char class [A-Z]  matches:
        A-Z A character range between Literal A and Literal Z
ss Literal ss
$ End of string
g modifier: global. All matches (don't return on first match)

Visualization

Regular expression visualization


Regex 101 Demo

Regex

/([A-Z]+)ss([A-Z]+)/g

Description

1st Capturing group ([A-Z]+) 
    Char class [A-Z] 1 to infinite times [greedy] matches:
        A-Z A character range between Literal A and Literal Z
ss Literal ss
2nd Capturing group ([A-Z]+) 
    Char class [A-Z] 1 to infinite times [greedy] matches:
        A-Z A character range between Literal A and Literal Z
g modifier: global. All matches (don't return on first match)

Visualization

Regular expression visualization


Specifically for you

String replaceUml = str
    .replaceAll("ß", "ss")
    .replaceAll("([A-Z])ss$", "$1SS")
    .replaceAll("([A-Z]+)ss([A-Z]+)", "$1SS$2");

Use String.replaceFirst() instead of String.replaceAll().

replaceAll("ß", "ss")

This will replace all the occurrences of "ß". Hence the output after this statement becomes something like this :

Klasse, STRAssE, FUss

Now replaceAll("A-Z|ss$", "SS") replaces the last occurrence of "ss" with "SS", hence your final result looks like this :

Klasse, STRAssE, FUSS

To get your expected result try this out :

String replaceUml = str.replaceFirst("ß", "ss").replaceAll("ß", "SS");
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top