Question

I am using Python with the re module and trying to match strings like decimal(4,1) and decimal(10,5), while only actually returning the 4,1 and 10,5, with the following regular expression:

(?<=decimal\()\d+,\d+(?=\)$)

Let's say I compile the regex with re.compile and name it DECIMAL. If I try to search decimal(4,1) for instances of the regex like so:

DECIMAL = re.compile(r'(?<=decimal\()\d+,\d+(?=\)$)')
results = DECIMAL.search('decimal(4,1)')

results.group(0) returns the string 4,1 as desired. However, if I try to match rather than search:

results = DECIMAL.match('decimal(4,1)')

results evaluates to None.

Does the match method fail here because match looks to fully match the consuming part of the regex against the beginning of the haystack and thus doesn't have any room for a preceding positive-length pattern to confirm?

As for the immediately practical, simply searching won't work in this case, since DECIMAL would turn up results in unacceptable strings like snarfdecimal(4,1). Should I be dropping in a beginning-of-string token somewhere, or is there something else I'm missing entirely?

Was it helpful?

Solution 2

You really don't need to use a positive look-behind at all,

>>> import re
>>> find_decimal = re.compile(r'decimal\((\d+,\d+)\)')
>>> find_decimal.match('decimal(4,1)').group(1)
'4,1'

As for the reason it doesn't work, not sure but I'd guess you are correct in your thinking.

OTHER TIPS

Unlike search(), Python's match() method automatically anchors the match at the beginning of the string. That means you're trying to match the literal string decimal( before the beginning of the string, which of course will always fail.

But as Jared pointed out, you don't need lookbehind for this anyway. In fact, lookbehind should be the last tool you reach for, not the first.

Here's a slightly modified version of Jared's regex:

r'\bdecimal\(\s*(\d+\s*,\s*\d+)\s*\)'

The most important change is the addition of the word boundary (\b) to prevent it matching things like snarfdecimal(4,1). If you really have to use match() instead of search(), you can "pad" the regex with .*?, forcing it to consume the intermediate characters:

r'.*?\bdecimal\(\s*(\d+\s*,\s*\d+)\s*\)'
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top