Question

I am trying to grab any digits in front of a known line number of a phone, if they exist (in Perl). There will be no dashes, only digits.

For example, say I know the line number will always be 8675309. 8675309 may or may not have leading digits, if it does I want to capture them. There is not really a limit on the number of leading digits.

$input          $digits       $number
'8675309'       ''            '8675309'
'8008675309'    '800'         '8675309'
'18888675309'   '1888'        '8675309'
'18675309'       '1'           '8675309'
'86753091'      not a match

/8675309$/ this will match how to capture the pre-digits in one regex?

Was it helpful?

Solution

Some regexes work better backwards than forwards. So sometimes it is useful to use sexeger, rather than regexes.

my $pn = '18008675309';

reverse($pn) =~ /^9035768(\d*)/;
my $got = reverse $1;

The regex is cleaner and avoids a lot of back tracking at the cost of some fummery with reversing the input and captured values.

The backtracking gain is smaller in this case than it would be if you had a general phone number extraction regex:

Regex:   /^(\d*)\d{7}$/
Sexeger: /^\d{7}(\d*)/

There is a whole class of problems where this technique is useful. For more info see the sexeger post on Perlmonks.

OTHER TIPS

my($digits,$number);
if ($input =~ /^(\d*)(8675309)$/) {
  ($digits,$number) = ($1,$2);
}

The * quantifier is greedy, but that means it matches as much as possible while still allowing a match. So initially, yes, \d* tries to gobble up all the digits in $number, but it reluctantly gives up character-by-character what it's matched until the whole pattern matches successfully.

Another approach is to chop off the tail:

(my $digits = $input) =~ s/8675309$//;

You could do the same without using a regular expression:

my $digits = $input;
substr($digits, -7) = "";

The above, at least with perl-5.10-1, could even be condensed to

substr(my $digits = $input, -7) = "";

The regex special variables $` and $& are another way of grabbing those pieces of information. They hold the contents of the data preceding the match and the match itself respectively.

   if ( /8675309$/ )
      {
      printf( "%s,%s,%s\n", $_, $`, $& );
      }
   else
      {
      printf( "%s,Not a match\n", $_ );
      }

There's a Perl package that deals with at least UK and US phone numbers.

It's called Number::Phone and the code is somewhere on the cpan.org site.

How about /(\d)?(8675309)/? UPDATE:

whoops that should haev been /(\d*)(8675309)/

I might not understand the problem. Why is there a difference between the first and fourth examples:

'8675309'    ''   '8675309'  
...  
'8675309'    '1'  '8675309'

If all you want is to separate the last seven digits from everything else, you could have said it that way rather than provide confusing examples. A regex for that would be:

/(\d*)(\d{7,7})$/

If you weren't just providing a hypothetical number, and really are only looking for lines with '8675309' (seems strange), replace the '\d{7,7}' with '8675309'.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top