Question

I need to write a Perl script to read in a file, and delete anything inside < >, even if they're on different lines. That is, if the input is:

Hello, world. I <enjoy eating
bagels. They are quite tasty.
I prefer when I ate a bagel to
when I >ate a sandwich. <I also
like >bananas.

I want the output to be:

Hello, world. I ate a sandwich. bananas.

I know how to do this if the text is on 1 line with a regex. But I don't know how to do it with multiple lines. Ultimately I need to be able to conditionally delete parts of a template so I can generate parametrized files for config files. I thought perl would be a good language but I am still getting the hang of it.

Edit: Also need more than 1 instance of <>

Was it helpful?

Solution

local $/;
my $text = <>;
s/<.*?>//gs;
print $text;

OTHER TIPS

You may want to check out a Perl module Text::Balanced, part of the core distribution. I think it'll be of help for you. Generally, one wants to avoid regexes to do that sort of thing IF the subject text is likely to have an inner set of delimiters, it can get very messy.

In Perl:

#! /usr/bin/perl   
use strict;

my $text = <>;
$text =~ s/<[^>]*>//g;
print $text;

The regex substitutes anything starting with a < through the first > (inclusive) and replaces it with nothing. The g is global (more than once).

EDIT: incorporated comments from Hynek and chaos

Ineffective one-liner way

perl -0777 -pe 's/<.*?>//gs'

same as program

local $/;
my $text = <>;
s/<.*?>//gs;
print $text;

It depends how big text you want convert here is more effective one-liner consuming line by line

perl -pe 'if ($a) {(s/.*?>// and do {s/<.*?>//g; $a = s/<.*//s;1}) or $_=q{}} else {s/<.*?>//g; $a = s/<.*//s}'

same as program

my $a;
while (<>) {
    if ($a) {
        if (s/.*?>//) {
            s/<.*?>//g;
            $a = s/<.*//s;
        }
        else { $_ = q{} }
    }
    else {
        s/<.*?>//g;
        $a = s/<.*//s;
    }
    print;
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top