Question

I am using groups to try to match on a certain pattern, and am not getting quite the results I expect. The pattern of interest are as follows:

([0-9]+(\.[0-9]+)+)

For string 1.23, I get $1=1.23, and $2=.23 which makes sense to me.

But for string 1.2.3, I get $1=1.2.3 and $2=.3, where I would expect $2=.2.3, because its group is a decimal point and a digit, repeated.

Can someone please explain to me how this works? Thank you!

Was it helpful?

Solution 2

"These pattern match variables are scalars and, as such, will only hold a single value. That value is whatever the capturing parentheses matched last."

http://blogs.perl.org/users/sirhc/2012/05/repeated-capturing-and-parsing.html

In you example, $1 matches 1.2.3. As the pattern repeats, $2 would be set to .2 until the final match of .3

OTHER TIPS

When you use capturing groups with a quantifier, only the last repetition of the captured pattern will be stored.

Perhaps this regex will meet your needs:

\b(\d+)((?:\.\d+)+)\b

This regex separates the leading integer sequence from its repeating fractional components.

(As indicated by @ysth, please keep in mind that \d may match more characters than you intend. If that is the case, use the character class [0-9] instead or use the /a modifier.)

Here's a Perl program that demonstrates this regex on a sample data set. (Also see the live demo.)

#!/usr/bin/perl -w

use strict;
use warnings;

while (<DATA>) {
    chomp;

    # A - A sequence of digits
    # B - A period and a sequence of digits
    # C - Repeat 'B'.

    if (/\b(\d+)((?:\.\d+)+)\b/) {
#           ^^^     ^^^^^
#            A        B
#                   ^^^^^^^
#                      C

        print "[$1]  [$2]\n";
    }
}

__END__
1.23
123.456
1.2.3
1.22.333.444

Expected Output:

[1]  [.23]
[123]  [.456]
[3]  [.2.3]
[4]  [.22.333.444]
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top