Question

Suppose I have a file containing lines I'm trying to match against:

foo
quux
bar

In my code, I have another array:

foo
baz
quux

Let's say we iterate through the file, calling each element $word, and the internal list we are checking against, @arr.

if( grep {$_ =~ m/^$word$/i} @arr)

This works correctly, but in the somewhat possible case where we have an test case of fo. in the file, the . operates as a wildcard operator in the regex, and fo. then matches foo, which is not acceptable.

This is of course because Perl is interpolating the variable into a regex.

The question:

How do I force Perl to use the variable literally?

Was it helpful?

Solution

The correct answer is - don't use regexps. I'm not saying regexps are bad, but using them for (what equals to) simple equality check is overkill.

Use: grep { lc($_) eq lc($word) } @arr and be happy.

OTHER TIPS

Use \Q...\E to escape special symbols directly in perl string after variable value interpolation:

if( grep {$_ =~ m/^\Q$word\E$/i} @arr)

From perlfaq6's answer to How do I match a regular expression that's in a variable?:


We don't have to hard-code patterns into the match operator (or anything else that works with regular expressions). We can put the pattern in a variable for later use.

The match operator is a double quote context, so you can interpolate your variable just like a double quoted string. In this case, you read the regular expression as user input and store it in $regex. Once you have the pattern in $regex, you use that variable in the match operator.

chomp( my $regex = <STDIN> );

if( $string =~ m/$regex/ ) { ... }

Any regular expression special characters in $regex are still special, and the pattern still has to be valid or Perl will complain. For instance, in this pattern there is an unpaired parenthesis.

my $regex = "Unmatched ( paren";

"Two parens to bind them all" =~ m/$regex/;

When Perl compiles the regular expression, it treats the parenthesis as the start of a memory match. When it doesn't find the closing parenthesis, it complains:

Unmatched ( in regex; marked by <-- HERE in m/Unmatched ( <-- HERE  paren/ at script line 3.

You can get around this in several ways depending on our situation. First, if you don't want any of the characters in the string to be special, you can escape them with quotemeta before you use the string.

chomp( my $regex = <STDIN> );
$regex = quotemeta( $regex );

if( $string =~ m/$regex/ ) { ... }

You can also do this directly in the match operator using the \Q and \E sequences. The \Q tells Perl where to start escaping special characters, and the \E tells it where to stop (see perlop for more details).

chomp( my $regex = <STDIN> );

if( $string =~ m/\Q$regex\E/ ) { ... }

Alternately, you can use qr//, the regular expression quote operator (see perlop for more details). It quotes and perhaps compiles the pattern, and you can apply regular expression flags to the pattern.

chomp( my $input = <STDIN> );

my $regex = qr/$input/is;

$string =~ m/$regex/  # same as m/$input/is;

You might also want to trap any errors by wrapping an eval block around the whole thing.

chomp( my $input = <STDIN> );

eval {
    if( $string =~ m/\Q$input\E/ ) { ... }
    };
warn $@ if $@;

Or...

my $regex = eval { qr/$input/is };
if( defined $regex ) {
    $string =~ m/$regex/;
    }
else {
    warn $@;
    }

Quotemeta

Returns the value of EXPR with all non-"word" characters backslashed.

http://perldoc.perl.org/functions/quotemeta.html

I don't think you want a regex in this case since you aren't matching a pattern. You're looking for a literal sequence of characters that you already know. Build a hash with the values to match and use that to filter @arr:

 open my $fh, '<', $filename or die "...";
 my %hash = map { chomp; lc($_), 1 } <$fh>;

 foreach my $item ( @arr ) 
      {
      next unless exists $hash{ lc($item) };
      print "I matched [$item]\n";
      }
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top