Question

My problem is to find a word between two words. Out of these two words one is an all UPPER CASE word which can be anything and the other word is "is". I tried out few regexes but none are helping me. Here is my example:

String :

In THE house BIG BLACK cat is very good.

Expected output :

cat

RegEx used :

(?<=[A-Z]*\s)(.*?)(?=\sis)

The above RegEx gives me BIG BLACK cat as output whereas I just need cat.

Was it helpful?

Solution 2

Try this one:

String TestInput = "In THE house BIG BLACK cat is very good.";
    Pattern p = Pattern
            .compile(
                    "(?<=\\b\\p{Lu}+\\s)  # lookbehind assertion to ensure a uppercase word before\n"
                            + "\\p{L}+    # matching at least one letter\n" 
                            + "(?=\\sis)  # lookahead assertion to ensure a whitespace is ahead\n"
                            , Pattern.COMMENTS);    Matcher m = p.matcher(TestInput);
if(m.find())
    System.out.println(m.group(0));

it matches only "cat".

\p{L} is a Unicode property for a letter in any language.

\p{Lu} is a Unicode property for an uppercase letter in any language.

OTHER TIPS

One solution is to simplify your regular expression a bit,

[A-Z]+\s(\w+)\sis

and use only the matched group (i.e., \1). See it in action here.

Since you came up with something more complex, I assume you understand all the parts of the above expression but for someone who might come along later, here are more details:

  • [A-Z]+ will match one or more upper-case characters
  • \s will match a space
  • (\w+) will match one or more word characters ([a-zA-Z0-9_]) and store the match in the first match group
  • \s will match a space
  • is will match "is"

My example is very specific and may break down for different input. Your question didn't provided many details about what other inputs you expect, so I'm not confident my solution will work in all cases.

You want to look for a condition that depends on several parts of infirmation and then only retrieve a specific part of that information. That is not possible in a regex without grouping. In Java you should do it like this:

public class Main {

    public static void main(String[] args) {
        Pattern pattern = Pattern.compile("[A-Z]+\\s(\\w+)\\sis");
        Matcher matcher = pattern.matcher("In THE house BIG BLACK cat is very good.");

        if (matcher.find())
           System.out.println(matcher.group(1));
        }
    }
}

The group(1) is the one with brackets around it. In this case w+. And that's your word. The return type of group() is String so you can use it right away

The following part has a extrange behavior

(?<=[A-Z]*\s)(.*?)

For some reason [A-Z]* is matching a empty string. And (.*?) is matching BIG BLACK. With a little tweaks, I think the following will work (but it still matches some false positives):

(?<=[A-Z]+\s)(\w+)(?=\sis)

A slightly better regex would be:

(?<=\b[A-Z]+\s)(\w+)(?=\sis)

Hope it helps

 String  m = "In THE house BIG BLACK cat is very good.";
       Pattern p = Pattern.compile("[A-Z]+\\s\\w+\\sis");
       Matcher m1 = p.matcher(m);
       if(m1.find()){
        String group []= m1.group().split("\\s");// split by space
        System.out.println(group[1]);// print the 2 position 
       }
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top