Question

I am splitting a text file into blocks in order to extract those blocks which do not contain a certain line by using a regular expression. The text file looks like this:

[Term]  
id: id1  
name: name1  
xref: type1:aab  
xref: type2:cdc  

[Term]  
id: id2  
name: name2  
xref: type1:aba  
xref: type3:fee 

Someone helped me a few days ago by showing me how to extract those blocks which do contain a certain regular expression (for example "xref: type3"):

while (<MYFILE>) {
  BEGIN { $/ = q|| }
    my @lines = split /\n/;
    for my $line ( @lines ) {
        if ( $line =~ m/xref:\s*type3/ ) {
            printf NEWFILE qq|%s|, $_;
            last;
        }
    }
}

Now I want to write all blocks in a new file which do not contain "xref: type3". I tried to do this by simply negating the regex

if ( $line !~ m/xref:\s*type3/ )

or alternatively by negating the if statement by using

unless ( $line =~ m/xref:\s*type3/ )

Unfortunately it doesn't work - the output file is the same as the the original one. Any ideas what I'm doing wrong?

Was it helpful?

Solution

You have:

For every line, print this block if this line doesn't match the pattern.

But you want:

For every line, print this line if none of the other lines in the block match the pattern.

As such, you can't start printing the block before you examined every line in the block (or at all lines until you find a matching line).

local $/ = q||;
while (<MYFILE>) {
    my @lines = split /\n/;

    my $skip = 0;
    for my $line ( @lines ) {
        if ( $line =~ m/^xref:\s*type3/ ) {
            $skip = 1; 
            last;
        }
    }

    if (!$skip) {
        for my $line ( @lines ) {
            print NEWFILE $line;
        }
    }
}

But there's no need to split into lines. We can check and print the whole block at once.

local $/ = q||;
while (<MYFILE>) {
    print NEWFILE $_ if !/^xref:\s*type3/m;
}

(Note the /m to make ^ match the start of any line.)

OTHER TIPS

The problem is that you are using unless with !~ which is interpreted as if $line does not NOT match do this. ( a double negative )

When using the unless block with the normal pattern matching operator =~ you code worked perfectly, that is I see the first block as output because it does not contain type3.

LOOP:
while (<$MYFILE>) {
  BEGIN { $/ = q|| }
    my @lines = split /\n/;
    for my $line ( @lines ) {
        unless ( $line =~ m/xref:\s*type3/ ) {
            printf qq|%s|, $_;
            last LOOP;
        }
  }
}

# prints
# [Term]
# id: id1
# name: name1
# xref: type1:aab
# xref: type2:cdc

Do not process the records line by line. Use a paragraph mode:

{   local $/ = q();
    while (<MYFILE>) {
        if (! /xref:\s*type3/ ) {
            printf NEWFILE qq|%s|, $_;
            last;
        }
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top