Pergunta

I have following Perl script to extract numbers from a log. It seems that the non-capturing group with ?: isn't working when I define the sub-pattern in a variable. It's only working when I leave out the grouping in either the regex-pattern or the sub-pattern in $number.

#!/usr/bin/perl
use strict;
use warnings;

my $number = '(:?-?(?:(?:\d+\.?\d*)|(?:\.\d+))(?:[Ee][+-]?\d+)?)';
#my $number = '-?(?:(?:\d+\.?\d*)|(?:\.\d+))(?:[Ee][+-]?\d+)?';

open(FILE,"file.dat") or die "Exiting with: $!\n";
while (my $line = <FILE>) {
        if ($line =~ m{x = ($number). y = ($number)}){
        print "\$1= $1\n";
        print "\$2= $2\n";
        print "\$3= $3\n";
        print "\$4= $4\n";
    };
}
close(FILE);

The output for this code looks like:

$1= 12.15
$2= 12.15
$3= 3e-5
$4= 3e-5

for an input of:

asdf x = 12.15. y = 3e-5 yadda

Those doubled outputs aren't desired.

Is this because of the m{} style in contrast to the regular m// patterns for regex? I only know the former style to get variables (sub-strings) in my regex expressions. I just noticed this for the backreferencing so possibly there are other differences for metacharacters?

Foi útil?

Solução

The delimiters you use for the regular expression aren't causing any problems but the following is:

(:?-?(?:(?:\d+\.?\d*)|(?:\.\d+))(?:[Ee][+-]?\d+)?)
 ^^
Notice this isn't a capturing group, it is an optional colon :

Probably a typo mistake but it is causing the trouble.

Edit: It looks that it is not a typo mistake, i substituted the variables in the regex and I got this:

x = ((:?-?(?:(?:\d+\.?\d*)|(?:\.\d+))(?:[Ee][+-]?\d+)?)). y = ((:?-?(?:(?:\d+\.?\d*)|(?:\.\d+))(?:[Ee][+-]?\d+)?))
    ^^           first and second group               ^^      ^^    third and fourth grouop                     ^^

As you can see the first and second capturing group are capturing exactly the same thing, the same is happening for the third and fourth capturing group.

Outras dicas

You're going to kick yourself...

Your regexp reads out as:

capture {
 maybe-colon
 maybe-minus
 cluster {     (?:(?:\d+\.?\d*)|(?:\.\d+))
  cluster {    (?:\d+\.?\d*)
   1+ digits
   maybe-dot
   0+ digits
  }
  -or-
  cluster {    (?:\.\d+)
   dot
   1+digits
  }
 }
 maybe cluster {
   E or e
   maybe + or -
   1+ digets
 }             (?:[Ee][+-]?\d+)?
}

... which is what you're looking for.

However, when you then do your actual regexp, you do:

$line =~ m{x = $number. y = $number})

(the curly braces are a distraction.... you may use any \W if the m or s has been specified)

What this is asking is to capture whatever the regexp defined in $number is.... which is, itself, a capture.... hence $1 and $2 being the same thing.

Simply remove the capture braces from either $number or the regexp line.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top