Question

I have a String

String s = "adfgadfbfgadg sa 2419sfgh";

I am trying to extract the substring

String substring = "sa 2419sfgh"; 

with Pattern and Matcher using the following regular expression and code.

formNumberRegex = "[al|sf|sa|sc|nrc|nrc form|doe|doe f|lsi|doe form psd f|doe al f]?[\\s\\-\\.]*[\\d]{3,6}[\\s\\-\\.]*[\\w]{1,4}";
formNumberRegexPattern = Pattern.compile(formNumberRegex);
formNumberMatcher = formNumberRegexPattern.matcher(s);

if (formNumberMatcher.find()) {
    String substring = formNumberMatcher.group();
}

However, I am only getting

substring = "a 2419sfgh";

What is wrong with my regular expression and/or Matcher?

Was it helpful?

Solution

Immediately, I notice:

[al|sf|sa|sc|nrc|nrc form|doe|doe f|lsi|doe form psd f|doe al f]?

should be:

(?:al|sf|sa|sc|nrc|nrc form|doe|doe f|lsi|doe form psd f|doe al f)?

The "non-capturing group", (?: ), lets you avoid capturing that first part as an initial group. This way, the whole expression is "match group 0" and that's it.

Tested here: http://regex101.com/r/lS9dT2

OTHER TIPS

You are using character class [...]

[al|sf|sa|sc|nrc|nrc form|doe|doe f|lsi|doe form psd f|doe al f]

instead of group

(al|sf|sa|sc|nrc|nrc form|doe|doe f|lsi|doe form psd f|doe al f)

What you used can be written as

(\\||a|l|s|f|s|a|s|c|n|r|c|n|r|c| |f|o|r|m|d|o|e|d|o|e| |f|l|s|i|d|o|e| |f|o|r|m| |p|s|d| |f|||d|o|e| |a|l| |f)

so since character class will match only one character from all used inside [...] it will accept | or a or l or s... and so on, while corrected version will accept only one of cases separated by OR like al or sf and so on.

So change your regex to

String formNumberRegex = "(al|sf|sa|sc|nrc|nrc form|doe|doe f|lsi|doe form psd f|doe al f)?[\\s\\-\\.]*[\\d]{3,6}[\\s\\-\\.]*[\\w]{1,4}";
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top