Domanda

I'm looking for a regular expression that will behave as follows:

input: "hello world."

output: he, el, ll, lo, wo, or, rl, ld

my idea was something along the lines of

    while($string =~ m/(([a-zA-Z])([a-zA-Z]))/g) {
        print "$1-$2 ";
    }

But that does something a little bit different.

È stato utile?

Soluzione

It's tricky. You have to capture it, save it, and then force a backtrack.

You can do that this way:

use v5.10;   # first release with backtracking control verbs

my $string = "hello, world!";
my @saved;

my $pat = qr{
    ( \pL {2} )
    (?{ push @saved, $^N })
    (*FAIL)
}x;

@saved = ();
$string =~ $pat;
my $count = @saved;
printf "Found %d matches: %s.\n", $count, join(", " => @saved);

produces this:

Found 8 matches: he, el, ll, lo, wo, or, rl, ld.

If you do not have v5.10, or you have a headache, you can use this:

my $string = "hello, world!";
my @pairs = $string =~ m{
  # we can only match at positions where the
  # following sneak-ahead assertion is true:
    (?=                 # zero-width look ahead
        (               # begin stealth capture
            \pL {2}     #       save off two letters
        )               # end stealth capture
    )
  # succeed after matching nothing, force reset
}xg;

my $count = @pairs;
printf "Found %d matches: %s.\n", $count, join(", " => @pairs);

That produces the same output as before.

But you might still have a headache.

Altri suggerimenti

No need "to force backtracking"!

push @pairs, "$1$2" while /([a-zA-Z])(?=([a-zA-Z]))/g;

Though you might want to match any letter rather than the limited set you specified.

push @pairs, "$1$2" while /(\pL)(?=(\pL))/g;

Yet another way to do it. Doesn't use any regexp magic, it does use nested maps but this could easily be translated to for loops if desired.

#!/usr/bin/env perl

use strict;
use warnings;

my $in = "hello world.";
my @words = $in =~ /(\b\pL+\b)/g;

my @out = map {
  my @chars = split '';
  map { $chars[$_] . $chars[$_+1] } ( 0 .. $#chars - 1 );
} @words;

print join ',', @out;
print "\n";

Again, for me this is more readable than a strange regex, YMMV.

I would use captured group in lookahead..

(?=([a-zA-Z]{2}))
    ------------
         |->group 1 captures two English letters 

try it here

You can do this by looking for letters and using the pos function to make use of the position of the capture, \G to reference it in another regex, and substr to read a few characters from the string.

use v5.10;
use strict;
use warnings;

my $letter_re = qr/[a-zA-Z]/;

my $string = "hello world.";
while( $string =~ m{ ($letter_re) }gx ) {
    # Skip it if the next character isn't a letter
    # \G will match where the last m//g left off.
    # It's pos() in a regex.
    next unless $string =~ /\G $letter_re /x;

    # pos() is still where the last m//g left off.
    # Use substr to print the character before it (the one we matched)
    # and the next one, which we know to be a letter.
    say substr $string, pos($string)-1, 2;
}

You can put the "check the next letter" logic inside the original regex with a zero-width positive assertion, (?=pattern). Zero-width meaning it is not captured and does not advance the position of a m//g regex. This is a bit more compact, but zero-width assertions get can get tricky.

while( $string =~ m{ ($letter_re) (?=$letter_re) }gx ) {
    # pos() is still where the last m//g left off.
    # Use substr to print the character before it (the one we matched)
    # and the next one, which we know to be a letter.
    say substr $string, pos($string)-1, 2;
}

UPDATE: I'd originally tried to capture both the match and the look ahead as m{ ($letter_re (?=$letter_re)) }gx but that didn't work. The look ahead is zero-width and slips out of the match. Other's answers showed that if you put a second capture inside the look-ahead then it can collapse to just...

say "$1$2" while $string =~ m{ ($letter_re) (?=($letter_re)) }gx;

I leave all the answers here for TMTOWTDI, especially if you're not a regex master.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top