How can I extract lines of text from a file?
Question
I have a directory full of files and I need to pull the headers and footers off of them. They are all variable length so using head or tail isn't going to work. Each file does have a line I can search for, but I don't want to include the line in the results.
It's usually
*** Start (more text here)
And ends with
*** Finish (more text here)
I want the file names to stay the same, so I need to overwrite the originals, or write to a different directory and I'll overwrite them myself.
Oh yeah, it's on a linux server of course, so I have Perl, sed, awk, grep, etc.
Solution
Try the flip flop! ".." operator.
# flip-flop.pl
use strict;
use warnings;
my $start = qr/^\*\*\* Start/;
my $finish = qr/^\*\*\* Finish/;
while ( <> ) {
if ( /$start/ .. /$finish/ ) {
next if /$start/ or /$finish/;
print $_;
}
}
U can then use the -i perl switch to update your file(s) like so.....
$ perl -i'copy_*' flip-flop.pl data.txt
...which changes data.txt but makes a copy beforehand as "copy_data.txt".
OTHER TIPS
GNU coreutils are your friend...
csplit inputfile %^\*\*\* Start%1 /^\*\*\* Finish/ %% {*}
This produces your desired file as xx00
. You can change this behaviour through the options --prefix
, --suffix
, and --digits
, but see the manual for yourself. Since csplit
is designed to produce a number of files, it is not possible to produce a file without suffix, so you will have to do the overwriting manually or through a script:
csplit $1 %^\*\*\* Start%1 /^\*\*\* Finish/ %% {*}
mv -f xx00 $1
Add loops as you desire.
To get the header:
cat yourFileHere | awk '{if (d > 0) print $0} /.*Start.*/ {d = 1}'
To get the footer:
cat yourFileHere | awk '/.*Finish.*/ {d = 1} {if (d < 1) print $0}'
To get the file from header to footer as you want:
cat yourFileHere | awk '/.*Start.*/ {d = 1; next} /.*Finish.*/ {d = 0; next} {if (d > 0) print $0}'
There's one more way, with csplit command, you should try something like:
csplit yourFileHere /Start/ /Finish/
And examine files named 'xxNN' where NN is running number, also take a look at csplit manpage.
Maybe? Start to Finish with not-delete.
$ sed -i '/^\*\*\* Start/,/^\*\*\* Finish/d!' *
or...less sure of it...but, if it works, should remove the Start and Finish lines as well:
$ sed -i -e '/./,/^\*\*\* Start/d' -e '/^\*\*\* Finish/,/./d' *
d!
may depend on the build of sed
you have -- not sure.
And, I wrote that entirely on (probably poor) memory.
A quick Perl hack, not tested. I am not fluent enough in sed or awk to get this effect with them, but I would be interested in how that would be done.
#!/usr/bin/perl -w
use strict;
use Tie::File;
my $Filename=shift;
tie my @File, 'Tie::File', $Filename or die "could not access $Filename.\n";
while (shift @File !~ /^\*\*\* Start/) {};
while (pop @File !~ /^\*\*\* Finish/) {};
untie @File;
Some of the examples in perlfaq5: How do I change, delete, or insert a line in a file, or append to the beginning of a file? may help. You'll have to adapt them to your situation. Also, Leon's flip-flop operator answer is the idiomatic way to do this in Perl, although you don't have to modify the file in place to use it.
A Perl solution that overwrites the original file.
#!/usr/bin/perl -ni
if(my $num = /^\*\*\* Start/ .. /^\*\*\* Finish/) {
print if $num != 1 and $num + 0 eq $num;
}