Question

I want to match:

first second

and

second first

so the regular expression:

re.match(r'(?:(?P<f>first) (?P<s>second)|(?P=s) (?P=f))', 'first second')

matches, but this one:

re.match(r'(?:(?P<f>first) (?P<s>second)|(?P=s) (?P=f))', 'second first')

does not matches. Is this a bug on backreference in A|B ?

Was it helpful?

Solution 2

How about:

(?=.*(?P<f>first))(?=.*(?P<s>second))

(?=...) is a positive lookahead it assumes that the word first is present somewhere in the string without making it part of the match (it's a zero length assertion). It's the same for second.

This regex is true if there is first and second in any order in the string.

OTHER TIPS

You've misunderstood how backreferences work. For a backreference to match anything, the original reference must have matched too.

In your second example, the (?P<f>first) group didn't match anything, so the (?P=f) back reference cannot match anything either.

Back references are the wrong tool here; you'll have to repeat at least one of your groups, literally:

r'(?:(?P<f>first )?(?P<s>second)(?(f)| first))'

would use a conditional pattern that only matches first after second if there was no f match before second:

>>> import re
>>> pattern = re.compile(r'(?:(?P<f>first )?(?P<s>second)(?(f)$| first))')
>>> pattern.match('first second').group()
'first second'
>>> pattern.match('second first').group()
'second first'
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top