Question

I'm having a hard time determining what characters must be escaped when using Perl's qr{} construct

I'm attempting to create a multi-line precompiled regex for text that contains a myriad of normally escaped characters (#*.>:[]) and also contains another precompiled regex. Additionally I need to match as strictly as possible for testing purposes.

my $output = q{# using defaults found in .config
*
*
Options:
  1. opt1
> 2. opt2
choice[1-2?]: };

my $sc = qr{(>|\s)}smx;
my $re = qr{# using defaults found in .config
*
*
Options:
$sc 1. opt1
$sc 2. opt2
choice[1-2?]: }mx;

if ( $output =~ $re ) {
  print "OK!\n";
}
else {
  print "D'oh!\n";
}

Error:

Quantifier follows nothing in regex; marked by <-- HERE in m/# using defaults found in .config
* <-- HERE 
*
Options:
(?msx-i:(>|\s)) 1. opt1
(?msx-i:(>|\s)) 2. opt2
choice[1-2?]: / at ./so.pl line 14.

Attempting to escape the asterisks results in a failed match (D'oh output). Attempting to escape other pesky chars also results in a failed match. I could continue trying different combos of what to escape, but there's a lot of variations here and am hoping someone could provide some insight.

Was it helpful?

Solution

You have to escape the delimiter for qr//, and you have to escape any regex metacharacters that you want to use as literals. If you want those to be literal *'s, you need to escape them since the * is a regex quantifier.

Your problem here is the various regex flags that you've added. The /m doesn't do anything because you don't use the beginning- or end-of-string anchors (^, $). The /s doesn't do anything because you don't use the wildcard . metacharacter. The /x makes all of the whitespace in your regex meaningless, and it turns that line with the # into a regex comment.

This is what you want, with regex flags removed and the proper things escaped:

my $sc = qr{(>|\s)};

my $re = qr{# using defaults found in \.config
\*
\*
Options:
$sc 1\. opt1
$sc 2\. opt2
choice\[1-2\?]: };

Although Damian Conway tells people in Perl Best Practices to always put these options on their regexes, you now see why he's wrong. You should only add them when you want what they do, and you should only add things when you know what they do. :) Here's what you might do if you want to use /x. You have to escape any literal whitespace, you need to denote the line endings somehow, and you have to escape the literal # character. What was readable before is now a mess:

my $sc  = qr{(>|\s)};
my $eol = qr{[\r\n]+};

my $re  = qr{\# \s+ using \s+ defaults \s+ found \s+ in \s+ \.config $eol
\*                    $eol
\*                    $eol
Options:              $eol
$sc \s+ 1\. \s+ opt1   $eol
$sc \s+ 2\. \s+ opt2   $eol
choice\[1-2\?]: \s+
}x;

if ( $output =~ $re ) {
  print "OK!\n";
}
else {
  print "D'oh!\n";
}

OTHER TIPS

Sounds like what you really want is Expect, but the thing you are most immediately looking for is the quotemeta operator which escapes all characters that have special meanings to a regex.

To answer your question directly (however), in addition to the unquote character (in this case }) you need to escape at a minimum, .[$()|*+?{\

Like brian said, you must escape the delimiter and regex metacharacters. Note that when using qr//x (which you are), you must also escape whitespace characters and # (which is a comment marker). You probably don't actually want to use /x here. If you want to be safe, you can escape any non-alphanumeric character.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top