How to remove repeated letters in java using (Regular Expressions) and being case Insensitive

StackOverflow https://stackoverflow.com/questions/17904903

  •  04-06-2022
  •  | 
  •  

Pregunta

I have been trying to do is to replace any repeated letters with the lower case version of their letter (in java). For example:

I want a function that maps:

bob -> bob
bOb -> bob
bOOb -> bob
bOob -> bob
boOb -> bob
bob -> bob
Bob -> Bob
bOb -> bob

However, I have been not successful to do this using regexs (in Java).

I have tried the following:

    String regex = "([A-za-z])\\1+";
    String str ="bOob";
    Pattern pattern = Pattern.compile(regex , Pattern.CASE_INSENSITIVE);
    Matcher matcher = pattern.matcher(str);
    System.out.println(matcher.replaceAll("$1"));

However, this returns bOb and not bob. (it works on boOb).

I also tried:

        Pattern pattern = Pattern.compile("(?i)([A-Za-z0-9])(?=\\1)", Pattern.CASE_INSENSITIVE);
        Matcher matcher = pattern.matcher(str);
        return matcher.replaceAll("");

This solve one problem, now bOob -> bob but brings another problem because now it maps boOb to bob.

NOTE: it should also map BOobOoboObOoObooOoOoOoOoOOb -> Bobobobobob.

I feel that at this point it might just be easier to loop over the string and do some logic based on each character but I just didn't want to give up using regexs... If there exists a solution using regexs, is it more likely to be more efficient than a loop going over each character?

Thanks in advance!

PS: I am aware that one could just lower case everything before passing the string, though, thats not what I wanted because it maps:

Bob -> bob

¿Fue útil?

Solución

Use Matcher#group() instead of $1 here

if (matcher.find()) {
    System.out.println(matcher.replaceAll(matcher.group(1)
                                          .toLowerCase()));
}

Lets you make use of toLowerCase() then.

EDIT : (in response to OP's comments)

Matcher#group(n) is same as $n -- it refers to the n'th capture group. So, group(1) and $1 both capture O except that you can switch the capture toLowerCase().

The loop is being run by replaceAll() not by the find(). Matcher#find() is required to initialize the groups, so that group(1) returns the capture before replaceAll() is invoked.

But, this also means the capture stays the same which suffices your requirements but would require the matcher to be reset for a string like BOobbOobboObbOoObbooOoOoOoOoOOb (notice the double b's). The loop would have to be driven by Mathcer#find() now which means replaceAll() gets traded with replaceFirst().

String regex = "([A-Za-z])\\1+";
String str = "BOobbOobboObbOoObbooOoOoOoOoOObb";

Pattern pattern = Pattern.compile(regex, Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(str);

while (matcher.find()) {
    str = matcher.replaceFirst(matcher.start() > 0 ? matcher.group(1)
                                    .toLowerCase() : matcher.group(1));
    matcher.reset(str);
}

System.out.println(str); // Bobobobobob

Matcher#start() is used here to identify if the match is at the start of input where case is left untouched.

Otros consejos

I think this is the code I was looking for (based on the accepted answer):

public String removeRepeatedLetters(String str, boolean caseSensitive){
    if(caseSensitive){
        return this.removeRepeatedLetters(str); //uses case sensitive version
    }else{
        Pattern patternRep = Pattern.compile("([A-Za-z])(\\1+)", Pattern.CASE_INSENSITIVE);
        Matcher matcher = patternRep.matcher(str);
        String output = str;
        while(matcher.find()){
            String matchStr = matcher.group(1);
            output = matcher.replaceFirst(matchStr.toLowerCase());
            matcher = patternRep.matcher(output);
            matcher.reset();
        }
        return output;
    }   
}

What it does is replace any repeated letters (whether caps or not caps) and replaces them with a single non-caps one.

I think is very close to working as I want it to, though it maps Bbob -> bob. I doubt that because its not mapping to Bob, it would affect too much the reason I am using this.

btw, if anyone can see how to optimize this, feel free to comment! It does annoy me a little the .reset(), though I am not sure if its neccesary.

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top