Question

I want to split an input string into blocks starting with \begin{<word>} and ending with \end{<word>} where <word> can be "block", "vers" or "refr" and do addBlock() on each block. When trying this method on a string containing two of these blocks, m.groupCount() correctly returns 2, but m.find() returns false. How can this be? m.group() throws an exception.

private void addBlocks(String in) {
    Pattern p = Pattern.compile("\\\\begin\\{(vers|refr|block)\\}.*\\\\end\\{(vers|refr|block)\\}");
    Matcher m = p.matcher(in);
    while (m.find()) {
        addBlock(m.group());
    }
}

Edit: Yep, there were several things wrong there. Regex is a pain in the ass, it isn't very intuitive, and there is not that much sensible help online. Here is the code that finally worked:

private void addBlocks(String in) {
    Pattern p = Pattern.compile("\\\\begin\{(block|vers|refr)\\}(.|$)*?\\\\end\\{(block|vers|refr)\\}", Pattern.DOTALL);
    Matcher m = p.matcher(in);
    while (m.find()) {
         addBlock(m.group());
    }
}
Was it helpful?

Solution

In general, your code works for me, at least for this test call:

addBlocks("foo bar \\begin{vers}bla\\end{vers}foo bar baz \\begin{refr}bla2\\end{refr} bla");

However, your regular expression will call addBlock() at most once because of the greedy * quantifier. You might rather want to use the *? quantifier:

Pattern p = Pattern.compile("\\\\begin\\{(vers|refr|block)\\}.*?\\\\end\\{(vers|refr|block)\\}");

With the *? quantifier you’ll get two matches for the above test call.

If there is no match on some input, then m.find() will correctly return false and m.group() will not be called (and thus won’t throw any IllegalStateException). Independent of the input string, m.groupCount() will always be 2 for your particular regular expression, as there are 2 capturing groups in the pattern.

OTHER TIPS

This will never give more than one result because of the .* which eats every character that is preceded by the closing tag.

groupCount() doesn't return the number of matches, but the number of capturing groups. Also explained here: https://stackoverflow.com/a/2989061/2947592

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top