Question

I thought I understood Perl RE to a reasonable extent, but this is puzzling me:

#!/usr/bin/perl
use strict;
use warnings;

my $test = "'some random string'";

if($test =~ /\'?(.*?)\'?/) {
       print "Captured $1\n";
       print "Matched $&";
}
else {
       print "What?!!";
}

prints

Captured
Matched '

It seems it has matched the ending ' alone, and so captured nothing.
I would have expected it to match the entire thing, or if it's totally non-greedy, nothing at all (as everything there is an optional match).
This in between behaviour baffles me, can anyone explain what is happening?

Was it helpful?

Solution

The \'? at the beginning and end means match 0 or 1 apostrophes greedily. (As another poster has pointed out, to make it non-greedy, it would have to be \'??)

The .*? in the middle means match 0 or more characters non-greedily.

The Perl regular expression engine will look at the first part of the string. It will match the beginning, but does so greedily, so it picks up the first apostrophe. It then matches non-greedily (so takes as little as it can) followed by an optional apostrophe. This is matched by the empty string.

OTHER TIPS

I think you mean something like:

/'(.*?)'/      // matches everything in single quotes

or

/'[^']*'/      // matches everything in single quotes, but faster

The singe quotes don't need to be escaped, AFAIK.

pattern? is greedy, if you want it to be non-greedy you must say pattern??:

#!/usr/bin/perl
use strict;
use warnings;

my $test = "'some random string'";

if($test =~ /\'?(.*?)\'?/) {
       print "Captured [$1]\n";
       print "Matched  [$&]\n";
}
if($test =~ /\'??(.*?)\'??/) {
       print "Captured [$1]\n";
       print "Matched  [$&]\n";
}

from perldoc perlre:

The following standard quantifiers are recognized:

*      Match 0 or more times
+      Match 1 or more times
?      Match 1 or 0 times
{n}    Match exactly n times
{n,}   Match at least n times
{n,m}  Match at least n but not more than m times

By default, a quantified subpattern is "greedy", that is, it will match as many times as possible (given a particular starting location) while still allowing the rest of the pattern to match. If you want it to match the minimum number of times possible, follow the quantifier with a "?". Note that the meanings don’t change, just the "greediness":

*?     Match 0 or more times
+?     Match 1 or more times
??     Match 0 or 1 time
{n}?   Match exactly n times
{n,}?  Match at least n times
{n,m}? Match at least n but not more than m times

Beware of making all elements of your regex optional (i.e. having all elements quantified with * or ? ). This lets the Perl regex engine match as much as it wants (even nothing), while still considering the match successful.

I suspect what you want is

/'(.*?)'/

I would say the closest answer to what you are looking for is

/'?([^']*)'?/

So "get the single quote if it's there", "get anything and everything that's not a single quote", "get the last single quote if it's there".

Unless you want to match "'don't do this'" - but who uses an apostrophe in a single quote anyway (and gets away with it for long)? :)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top