What you need is the -0777
switch which will cause the entire file to be read into a single string. If this is not used, the files will be read in line-by-line mode, and you can never match a multi-line statement that way.
Also, as Andomar points out, you are missing the -p
switch, but I assume you figured that out.
The modifiers on the regex won't matter in this case, except the /g
modifier. /m
only affects ^
and $
, and /s
causes wildcard .
to also match newlines. None of this applies to your regex.
So basically, you want something like:
perl -0777 -pi -e 's/<![^>]+>//g' ...
Side note:
Html should be handled with parsers, ideally, so I spent a few minutes working on using HTML::Parser
which has a convenient option to strip declarations by adding a handler. Something like this seems to print ok for a single file:
perl -MHTML::Parser -we '
$p = HTML::Parser->new(default_h => [sub {print @_},'text'] );
$p->handler(declaration => '');
$p->parse_file(shift) or die $!; " yourfile.html
I figured it would be overkill so I abandoned trying to fix it with the -pi
in-place edit switches, but it is (probably) easily implemented in a script.