RegEx to match all characters between the last occuring upper case word and another word in a String

StackOverflow https://stackoverflow.com/questions/21181596

  •  29-09-2022
  •  | 
  •  

Question

I need to match all characters between the last occuring upper case word in a String and another word. Input Text:The CLEVER fox JUMPED OVER the big and (Hole 2) wall in the night.

RegEx Used :

(?<=\b[A-Z]+\s)(.+?)(?=\sin)

The above regex gives fox JUMPED OVER the big and (Hole 2) wall

Expected Output: the big and (Hole 2) wall

Can anyone crack this?

Was it helpful?

Solution

This might not be the most effective solution, but it seems to work:

String text = "The CLEVER fox JUMPED OVER the big wall in the night.";
String regex = "(\\b[A-Z]+\\s)(?!.*\\b[A-Z]+\\b)(.+?)(\\sin)";
Matcher m = Pattern.compile(regex).matcher(text);
if (m.find()) {
    System.out.println(m.group(2));
}

It uses negative look-ahead to make sure there are no more upper-case words in the text before capturing the wanted data.

OTHER TIPS

You can simply exclude upper case characters in your second matching expression

(?<=\b[A-Z]+\s)([^A-Z]+)(?=\sin)

This will force the first part to match The CLEVER fox JUMPED OVER, the second matching expression will yield the big wall and the last one matches the only in sequence in your test sentence.

How about:

[A-Z][\s.](?!.*?[A-Z])(.*)\sin

Expl.: Find a capital letter followed by a white space, NOT followed by anything followed by a capital letter. Then capture anything up to, but not including, a space followed by the given word.

This captures the wanted part only.

Regards

How about:

^.*(?:\b[A-Z]+\b)(.+?)(?=\sin)

Explanation:

The regular expression:

(?-imsx:^.*(?:\b[A-Z]+\b)(.+?)(?=\sin))

matches as follows:

NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  ^                        the beginning of the string
----------------------------------------------------------------------
  .*                       any character except \n (0 or more times
                           (matching the most amount possible))
----------------------------------------------------------------------
  (?:                      group, but do not capture:
----------------------------------------------------------------------
    \b                       the boundary between a word char (\w)
                             and something that is not a word char
----------------------------------------------------------------------
    [A-Z]+                   any character of: 'A' to 'Z' (1 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
    \b                       the boundary between a word char (\w)
                             and something that is not a word char
----------------------------------------------------------------------
  )                        end of grouping
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    .+?                      any character except \n (1 or more times
                             (matching the least amount possible))
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  (?=                      look ahead to see if there is:
----------------------------------------------------------------------
    \s                       whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
    in                       'in'
----------------------------------------------------------------------
  )                        end of look-ahead
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top