Question

I'm trying to build a finite state machine and I want to check the sequence that I get, with a regular expression. I need to check if the sequence is from the the following form:

For example:

"A,B,C,C,C,C,C,A" -> is accepted.

"A,B,C,C,C,C,A" -> is ignored.

"A,B,C,C,C,C,C,C,A" -> is ignored.

I found this post and that post, but everything I tried simply doesn't work.

I tried the next things: A\B\D{5}\A, ABD{5}A and a couple more, but again with no success.

EDIT: I want to know if the C character is return exactly 5 times, before and after doesn't matter at all, meaning it could be like this also:

A,A,A,F,F,R,E,D,C,C,C,C,C, ......

Don't consider the commas.

The problem is that I need to find if a sequence is accepted but, the sequence is from the next form: A,B, C*10, I created the machine class, the state class and the event class. But now I need to know if I have exactly 5 returns of C, and it causing me a lot of problems.

EDIT: It's not working, see the code Iv'e added.

String sequence1 = "A,B,C,C,C,C,A";
String sequence2 = "A,B,C,C,C,C,C,A";
String sequence3 = "A,B,C,C,C,C,C,C,A";
Pattern mPattern = Pattern.compile("(\\w)(?:,\\1){4}");
Matcher m = mPattern.matcher(sequance1);
m.matches(); //FALSE
Matcher m = mPattern.matcher(sequance2);
m.matches(); //FALSE
Matcher m = mPattern.matcher(sequance3);
m.matches(); //FALSE

It's returning always false.

How can I achieve this?

Thanks.

Was it helpful?

Solution

Your regex is not working because you are not considering the comma in your string, which I assume is available.

You can try the following regex (I'm posting here a generalized pattern, you can modify it accordingly): -

"(\\w)(?:,\\1){4}"

This will match any 5 sequence of same characters separated by comma.

\1 is used to backreference the 1st matched character, and the rest of the 4 characters should be the same as that.

Explanation: -

"(         // 1st capture group
   \\w     // Start with a character
 )
 (?:       // Non-capturing group
    ,      // Match `,` after `C`
    \\1    // Backreference to 1st capture group. 
           // Match the same character as in (\\w)
 ){4}"     // Group close. Match 4 times 
           // As 1st one we have already matched in (\\w)

UPDATE: -

If you just want to match 5 length sequence, you can add a negation of the matched character after the 5th match: -

"(\\w)(?:,\\1){4}(?!,\\1)"

(?!,\\1) -> Is negative look-ahead assertion. It will match 5 consecutive character that are not followed by the same character.

UPDATE: -

In the above Regex, we also need to do a negative look-behind for \\1 which we can't do. So, I came up with this wierd looking Regex. Which I myself don't like, but you can try it whether it works or not: -

Not Tested: -

"(\\w),(^\\1)(?:,\\2){4}(?!,\\2)"

Explanation: -

(       // First Capture Group
  \\w   // Any character, before your required sequence. (e.g. `A` in `A,C,C,C,C,C`)
)       // Group end
,       // comma after `A`

(          // Captured group 2
   ^\\1    // Character other than the one in the first captured group. 
           // Since, We now want sequence of `C` after `A`
)
(?:        // non-capturing group
   ,       // Match comma
   \\2     // match the 2nd capture group character. Which is different from `A`, 
           // and same as the one in group 2, may be `C`

){4}       // Match 4 times

(?!        // Negative look-ahead
    ,
    \\2    // for the 2nd captured group, `C`
)

I don't know whether that explanation makes the most sense or not. But you can try it. If it works, and you can't understand, then I'll try to explain a little better.

OTHER TIPS

I don't understand what you have tried, but you don't need to escape letters to match them.

I am not sure what your requirements are, but to find 5 repeated characters you can use this:

(\\p{L})(?:,\\1){4}

This would find all letters that are repeated 5 times. See it here on Regexr.

On Regexr I used \w because \p{L} is not supported there, but it is in Java.

\p{L} is a Unicode property matching every letter in any language.

  1. The idea here is to match a letter. This is done by \\p{L}.

  2. This letter is stored in a backreference because there are the brackets around (\\p{L}).

  3. Then there is the non-capturing group (?:,\\1). This matches a comma and the \\1 is a reference to the letter captured before.

  4. This non-capturing group is repeated 4 times (?:,\\1){4}.

==> as result this pattern matches on 5 identical letters with commas between.

The problem here is, this expression will match at least 5 identical letters. If there are more of them it will also (partly) match.

Update:

I don't see a chance to get the result directly from a regex. But here is a method to get the length indirectly:

String[] TestInput = { "A,B,C,C,C,C,C", "A,B,C,C,C,C,C,D,E",
        "C,C,C,C,C", "C,C,C,C,C,D,E", "A,B,C,C,C,C", "C,C,C,C",
        "A,B,C,C,C,C,C,C,D,E", "C,C,C,C,C,C,D,E", "C,C,C,C,C,C" };

// Match at least 5 letters in a row
// The letter is in group 2
// The complete found sequence is in group 1
Pattern p = Pattern.compile("((\\p{L})(?:,\\2){4,})");

for (String t : TestInput) {
    Matcher m = p.matcher(t);
        if (m.find()) {

            // Get the length of the found sequence, after the commas has
            // been removed
            int letterLength = m.group(1).toString().replace(",", "")
                    .length();
            // Check your condition of exactly 5 equal letters
            if (letterLength == 5) {
                System.out.println(t + " ==> " + true);
            } else {
                System.out.println(t + " ==> " + false);
            }
        }else {
            System.out.println(t + " ==> " + false);
        }
}

Output:

A,B,C,C,C,C,C ==> true
A,B,C,C,C,C,C,D,E ==> true
C,C,C,C,C ==> true
C,C,C,C,C,D,E ==> true
A,B,C,C,C,C ==> false
C,C,C,C ==> false
A,B,C,C,C,C,C,C,D,E ==> false
C,C,C,C,C,C,D,E ==> false
C,C,C,C,C,C ==> false

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top