Question

I've a text file like this

mc1s2  L#'|NA|det indice|indice|nc Sensex|NA|adj
progressait|progresser|v de|de|prep

and another file text like this

programmer:_[1]_:_P0_(P1)=1 progresser:_[1]_:_P0=1 
prohiber:_[1]_:_P0_P1=1
projeter:_[3]_:_P0_P1=1;_:_P0_P1_(PL)=1;_:_P0_P1_(PP<sur>)=1

I would like to have a replace in order to create a third file text like this

mc1s2  L#'|NA|det indice|indice|nc Sensex|NA|adj
progresser:_[1]_:_P0=1 de|de|prep As you can see I'd like to replace
progressait|progresser|v with progresser:_[1]_:_P0=1. 

I would like to do this for all verbs.

This script answer to my exigence but I can't understand the last part of it

use strict;
use warnings;
use autodie;

my $lookupfile = 'lookup.txt';
# Contains:
# programmer:_[1]_:_P0_(P1)=1
# progresser:_[1]_:_P0=1 
# prohiber:_[1]_:_P0_P1=1
# projeter:_[3]_:_P0_P1=1;_:_P0_P1_(PL)=1;_:_P0_P1_(PP<sur>)=1

my $datafile = 'data.txt';
# Contains:
# mc1s2  L#'|NA|det indice|indice|nc Sensex|NA|adj progressait|progresser|v de|de|prep 

my %lookup;
open my $fh, '<', $lookupfile;
while (<$fh>) {
    chomp;
    my ($field) = split ':';
    $lookup{$field} = $_;
}

# use Data::Dump; # Used to debug the lookup table.
# dd \%lookup;

open $fh, '<', $datafile; while (<$fh>) {
    s{(?<=\s)(\S+)} {
        my $entry = $1;
        my @fields = split '\|', $entry;
        $lookup{$fields[1]} // $entry;
    }eg;

    print;
}

I can't understand this :

open $fh, '<', $datafile;
while (<$fh>) {
    s{(?<=\s)(\S+)}{
        my $entry = $1;
        my @fields = split '\|', $entry;
        $lookup{$fields[1]} // $entry;
    }eg;

Can you help me?

Was it helpful?

Solution

This substitution

s{(?<=\s)(\S+)}{
    my $entry = $1;
    my @fields = split '\|', $entry;
    $lookup{$fields[1]} // $entry;
}eg;

uses the /e modifier, which indicates that the replacement string is not to be used directly, but executed as Perl code to generate the string to replace the match.

  • The match finds the next sequence of non-space characters that follow a space character, so in this case $1 is initially set to L#'|NA|det

  • $1 is copied to $entry, and $entry is split on the pipe characters | into @fields

  • The %lookup hash is indexed with $fields[1] - the second entry in @fields. Here that is the string NA

  • The code block returns the value of that hash element, or the whole of $entry if there was no hash element with that key. Note that, because $entry is the whole of the matched string, there is no change unless a corresponding element is found in %lookup because the string is replaced with itself

I hope this helps

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top