Frage

As detailed in the perlpacktut, you can use an unpack string of X/Y* to first get the length of a byte stream and then read exactly that many bytes. However, I'm struggling to find anything like that within a regular expression with, say, plain ASCII numbers and strings. For example, a Bencoded string is in the form of:

[length]:[bytes]
4:spam
4:spam10:green eggs

I remember once being able to pull this off, but only with the use of ??{}, and I don't have the code handy right now. Can this be done without ??{} (which is super experimental), using one of the newer 5.10 captures/backreferences?

The obvious expression doesn't work:

/(\d+)\:(.{\1})/g
/(\d+)\:(.{\g-1})/g
War es hilfreich?

Lösung

Do it with a regular expression with the /g flag and the \G anchor, but in scalar context. This maintains the position in the string right after the last pattern match (or the beginning for the first one). You can walk along the string this way. Get the length, skip over the colon, and then use substr to pick up the right number of characters. You can actually assign to pos, so update it for the characters you just extracted. redo that until you have no more matches:

use v5.10.1;

LINE: while( my $line = <DATA> ) {
    chomp( $line );
    {
    say $line;
    next LINE unless $line =~ m/\G(\d+):/g;  # scalar /g!
    say "\t1. pos is ", pos($line); 
    my( $length, $string ) = ( $1, substr $line, pos($line), $1 );
    pos($line) += $length; 
    say "\t2. pos is ", pos($line); 
    print "\tFound length $length with [$string]\n";
    redo;
    }
    }

__END__
4:spam6:Roscoe
6:Buster10:green eggs
4:abcd5:123:44:Mimi

Notice the edge case in the last input line. That 3: is part of the string, not a new record. My output is:

4:spam6:Roscoe
    1. pos is 2
    2. pos is 6
    Found length 4 with [spam]
4:spam6:Roscoe
    1. pos is 8
    2. pos is 14
    Found length 6 with [Roscoe]
4:spam6:Roscoe
6:Buster10:green eggs
    1. pos is 2
    2. pos is 8
    Found length 6 with [Buster]
6:Buster10:green eggs
    1. pos is 11
    2. pos is 21
    Found length 10 with [green eggs]
6:Buster10:green eggs
4:abcd5:123:44:Mimi
    1. pos is 2
    2. pos is 6
    Found length 4 with [abcd]
4:abcd5:123:44:Mimi
    1. pos is 8
    2. pos is 13
    Found length 5 with [123:4]
4:abcd5:123:44:Mimi
    1. pos is 15
    2. pos is 19
    Found length 4 with [Mimi]
4:abcd5:123:44:Mimi

I figured there might be a module for this, and there is: Bencode. It does what I did. That means I did a lot of work for nothing. Always look at CPAN first. Even if you don't use the module, you can look at their solution :)

Andere Tipps

No, I don't think that it's possible without the use of (??{ ... }), which would be:

/(\d++):((??{".{$^N}"}))/sg
Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top