Question

There is a known "pattern" to get the captured group value or an empty string if no match:

match = re.search('regex', 'text')
if match:
    value = match.group(1)
else:
    value = ""

or:

match = re.search('regex', 'text')
value = match.group(1) if match else ''

Is there a simple and pythonic way to do this in one line?

In other words, can I provide a default for a capturing group in case it's not found?


For example, I need to extract all alphanumeric characters (and _) from the text after the key= string:

>>> import re
>>> PATTERN = re.compile('key=(\w+)')
>>> def find_text(text):
...     match = PATTERN.search(text)
...     return match.group(1) if match else ''
... 
>>> find_text('foo=bar,key=value,beer=pub')
'value'
>>> find_text('no match here')
''

Is it possible for find_text() to be a one-liner?

It is just an example, I'm looking for a generic approach.

Was it helpful?

Solution

Quoting from the MatchObjects docs,

Match objects always have a boolean value of True. Since match() and search() return None when there is no match, you can test whether there was a match with a simple if statement:

match = re.search(pattern, string)
if match:
   process(match)

Since there is no other option, and as you use a function, I would like to present this alternative

def find_text(text, matches = lambda x: x.group(1) if x else ''):
    return matches(PATTERN.search(text))

assert find_text('foo=bar,key=value,beer=pub') == 'value'
assert find_text('no match here') == ''

It is the same exact thing, but only the check which you need to do has been default parameterized.

Thinking of @Kevin's solution and @devnull's suggestions in the comments, you can do something like this

def find_text(text):
    return next((item.group(1) for item in PATTERN.finditer(text)), "")

This takes advantage of the fact that, next accepts the default to be returned as an argument. But this has the overhead of creating a generator expression on every iteration. So, I would stick to the first version.

OTHER TIPS

You can play with the pattern, using an empty alternative at the end of the string in the capture group:

>>> re.search(r'((?<=key=)\w+|$)', 'foo=bar,key=value').group(1)
'value'
>>> re.search(r'((?<=key=)\w+|$)', 'no match here').group(1)
''

It's possible to refer to the result of a function call twice in a single one-liner: create a lambda expression and call the function in the arguments.

value = (lambda match: match.group(1) if match else '')(re.search(regex,text))

However, I don't consider this especially readable. Code responsibly - if you're going to write tricky code, leave a descriptive comment!

One-line version:

if re.findall(pattern,string): pass

The issue here is that you want to prepare for multiple matches or ensure that your pattern only hits once. Expanded version:

# matches is a list
matches = re.findall(pattern,string)

# condition on the list fails when list is empty
if matches:
    pass

So for your example "extract all alphanumeric characters (and _) from the text after the key= string":

# Returns 
def find_text(text):
    return re.findall("(?<=key=)[a-zA-Z0-9_]*",text)[0]

One line for you, although not quite Pythonic.

find_text = lambda text: (lambda m: m and m.group(1) or '')(PATTERN.search(text))

Indeed, in Scheme programming language, all local variable constructs can be derived from lambda function applications.

Re: "Is there a simple and pythonic way to do this in one line?" The answer is no. Any means to get this to work in one line (without defining your own wrapper), is going to be uglier to read than the ways you've already presented. But defining your own wrapper is perfectly Pythonic, as is using two quite readable lines instead of a single difficult-to-read line.

Update for Python 3.8+: The new "walrus operator" introduced with PEP 572 does allow this to be a one-liner without convoluted tricks:

value = match.group(1) if (match := re.search('regex', 'text')) else ''

Many would consider this Pythonic, particularly those who supported the PEP. However, it should be noted that there was fierce opposition to it as well. The conflict was so intense that Guido van Rossum stepped down from his role as Python's BDFL the day after announcing his acceptance of the PEP.

You can do it as:

value = re.search('regex', 'text').group(1) if re.search('regex', 'text') else ''

Although it's not terribly efficient considering the fact that you run the regex twice.

Or to run it only once as @Kevin suggested:

value = (lambda match: match.group(1) if match else '')(re.search(regex,text))

One liners, one liners... Why can't you write it on 2 lines?

getattr(re.search('regex', 'text'), 'group', lambda x: '')(1)

Your second solution if fine. Make a function from it if you wish. My solution is for demonstrational purposes and it's in no way pythonic.

Starting Python 3.8, and the introduction of assignment expressions (PEP 572) (:= operator), we can name the regex search expression pattern.search(text) in order to both check if there is a match (as pattern.search(text) returns either None or a re.Match object) and use it to extract the matching group:

# pattern = re.compile(r'key=(\w+)')
match.group(1) if (match := pattern.search('foo=bar,key=value,beer=pub')) else ''
# 'value'
match.group(1) if (match := pattern.search('no match here')) else ''
# ''
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top