Why does my non-greedy Perl regex match nothing?
-
23-08-2019 - |
Question
I thought I understood Perl RE to a reasonable extent, but this is puzzling me:
#!/usr/bin/perl
use strict;
use warnings;
my $test = "'some random string'";
if($test =~ /\'?(.*?)\'?/) {
print "Captured $1\n";
print "Matched $&";
}
else {
print "What?!!";
}
prints
Captured
Matched '
It seems it has matched the ending ' alone, and so captured nothing.
I would have expected it to match the entire thing, or if it's totally non-greedy, nothing at all (as everything there is an optional match).
This in between behaviour baffles me, can anyone explain what is happening?
Solution
The \'?
at the beginning and end means match 0 or 1 apostrophes greedily. (As another poster has pointed out, to make it non-greedy, it would have to be \'??
)
The .*?
in the middle means match 0 or more characters non-greedily.
The Perl regular expression engine will look at the first part of the string. It will match the beginning, but does so greedily, so it picks up the first apostrophe. It then matches non-greedily (so takes as little as it can) followed by an optional apostrophe. This is matched by the empty string.
OTHER TIPS
I think you mean something like:
/'(.*?)'/ // matches everything in single quotes
or
/'[^']*'/ // matches everything in single quotes, but faster
The singe quotes don't need to be escaped, AFAIK.
pattern?
is greedy, if you want it to be non-greedy you must say pattern??
:
#!/usr/bin/perl
use strict;
use warnings;
my $test = "'some random string'";
if($test =~ /\'?(.*?)\'?/) {
print "Captured [$1]\n";
print "Matched [$&]\n";
}
if($test =~ /\'??(.*?)\'??/) {
print "Captured [$1]\n";
print "Matched [$&]\n";
}
from perldoc perlre:
The following standard quantifiers are recognized:
* Match 0 or more times + Match 1 or more times ? Match 1 or 0 times {n} Match exactly n times {n,} Match at least n times {n,m} Match at least n but not more than m times
By default, a quantified subpattern is "greedy", that is, it will match as many times as possible (given a particular starting location) while still allowing the rest of the pattern to match. If you want it to match the minimum number of times possible, follow the quantifier with a "?". Note that the meanings don’t change, just the "greediness":
*? Match 0 or more times +? Match 1 or more times ?? Match 0 or 1 time {n}? Match exactly n times {n,}? Match at least n times {n,m}? Match at least n but not more than m times
Beware of making all elements of your regex optional (i.e. having all elements quantified with * or ? ). This lets the Perl regex engine match as much as it wants (even nothing), while still considering the match successful.
I suspect what you want is
/'(.*?)'/
I would say the closest answer to what you are looking for is
/'?([^']*)'?/
So "get the single quote if it's there", "get anything and everything that's not a single quote", "get the last single quote if it's there".
Unless you want to match "'don't do this'" - but who uses an apostrophe in a single quote anyway (and gets away with it for long)? :)