Question

I can't figure out what the longest_match option does in Google's re2 library.

Explanations which are incorrect:

  • when applying /a*/ to "aaaaa", setting longest-match makes the Kleene closure greedy so that the regex matches the whole string "aaaaa" instead of the empty string "".

  • when applying /a*/ to "abaa", setting longest-match makes the regex match the longest match "aa" (starting at index 2) instead of the leftmost match "a" (starting at index 0)

I haven't been able to find anyone on the internet who discusses this, which suggests that I'm the only one confused. Can anyone help me out?

Was it helpful?

Solution

Usually POSIX regular expressions return the leftmost longest match. That is (assuming you only search for the first match, which is the default behavior for most regex implementations) the regex /a*/ will return "a" when applied to "abaa" because the leftmost a (position 0) matches. The regex declares success before having seen the aa that follow in position 2.

Other regex libraries (PCRE for example) return the first possible match, so for example

a(b|bb)

will match "ab" in the string "abb" because the first alternation already matches.

Now I don't know the re2 library, but I assume that the longest_match option will collect all possible matches from a string and then return the longest one of them.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top