How can I read lines from the end of file in Perl?
-
08-07-2019 - |
Question
I am working on a Perl script to read CSV file and do some calculations. CSV file has only two columns, something like below.
One Two
1.00 44.000
3.00 55.000
Now this CSV file is very big ,can be from 10 MB to 2GB.
Currently I am taking CSV file of size 700 MB. I tried to open this file in notepad, excel but it looks like no software is going to open it.
I want to read may be last 1000 lines from CSV file and see the values. How can I do that? I cannot open file in notepad or any other program.
If I write a Perl script then I need to process complete file to go to end of file and then read last 1000 lines.
Is there any better way to that? I am new to Perl and any suggestions will be appreciated.
I have searched net and there are some scripts available like File::Tail but I don't know they will work on windows ?
OTHER TIPS
The File::ReadBackwards module allows you to read a file in reverse order. This makes it easy to get the last N lines as long as you aren't order dependent. If you are and the needed data is small enough (which it should be in your case) you could read the last 1000 lines into an array and then reverse
it.
This is only tangentially related to your main question, but when you want to check if a module such as File::Tail works on your platform, check the results from CPAN Testers. The links at the top of the module page in CPAN Search lead you to
Looking at the matrix, you see that indeed this module has a problem on Windows on all version of Perl tested:
I've wrote quick backward file search using the following code on pure Perl:
#!/usr/bin/perl
use warnings;
use strict;
my ($file, $num_of_lines) = @ARGV;
my $count = 0;
my $filesize = -s $file; # filesize used to control reaching the start of file while reading it backward
my $offset = -2; # skip two last characters: \n and ^Z in the end of file
open F, $file or die "Can't read $file: $!\n";
while (abs($offset) < $filesize) {
my $line = "";
# we need to check the start of the file for seek in mode "2"
# as it continues to output data in revers order even when out of file range reached
while (abs($offset) < $filesize) {
seek F, $offset, 2; # because of negative $offset & "2" - it will seek backward
$offset -= 1; # move back the counter
my $char = getc F;
last if $char eq "\n"; # catch the whole line if reached
$line = $char . $line; # otherwise we have next character for current line
}
# got the next line!
print $line, "\n";
# exit the loop if we are done
$count++;
last if $count > $num_of_lines;
}
and run this script like:
$ get-x-lines-from-end.pl ./myhugefile.log 200
Without tail, a Perl-only solution isn't that unreasonable.
One way is to seek from the end of the file, then read lines from it. If you don't have enough lines, seek even further from the end and try again.
sub last_x_lines {
my ($filename, $lineswanted) = @_;
my ($line, $filesize, $seekpos, $numread, @lines);
open F, $filename or die "Can't read $filename: $!\n";
$filesize = -s $filename;
$seekpos = 50 * $lineswanted;
$numread = 0;
while ($numread < $lineswanted) {
@lines = ();
$numread = 0;
seek(F, $filesize - $seekpos, 0);
<F> if $seekpos < $filesize; # Discard probably fragmentary line
while (defined($line = <F>)) {
push @lines, $line;
shift @lines if ++$numread > $lineswanted;
}
if ($numread < $lineswanted) {
# We didn't get enough lines. Double the amount of space to read from next time.
if ($seekpos >= $filesize) {
die "There aren't even $lineswanted lines in $filename - I got $numread\n";
}
$seekpos *= 2;
$seekpos = $filesize if $seekpos >= $filesize;
}
}
close F;
return @lines;
}
P.S. A better title would be something like "Reading lines from the end of a large file in Perl".
You could use Tie::File module I believe. It looks like this loads the lines into an array, then you could get the size of the array and process arrayS-ze-1000 up to arraySize-1.
Another Option would be to count the number of lines in the file, then loop through the file once, and start reading in values at numberofLines-1000
$count = `wc -l < $file`;
die "wc failed: $?" if $?;
chomp($count);
That would give you number of lines (on most systems.
If you know the number of lines in the file, you can do
perl -ne "print if ($. > N);" filename.csv
where N is $num_lines_in_file - $num_lines_to_print. You can count the lines with
perl -e "while (<>) {} print $.;" filename.csv
The modules are the way to go. However, sometimes you may be writing a piece of code that you want to run on a variety of machines that may be missing the more obscure CPAN modules. In that case why not just 'tail' and dump the output to a temp file from within Perl?
#!/usr/bin/perl
`tail --lines=1000 /path/myfile.txt > tempfile.txt`
You then have something that isn't dependent on a CPAN module if installing one may present an issue.
Without relying on tail, which I probably would do, if you have more than $FILESIZE [2GB?] of memory then I'd just be lazy and do:
my @lines = <>;
my @lastKlines = @lines[-1000,-1];
Though the other answers involving tail
or
seek()
are pretty much the way to go on this.
You should absolutely use File::Tail, or better yet another module. It's not a script, it's a module (programming library). It likely works on Windows. As somebody said, you can check this on CPAN Testers, or often just by reading the module documentation or just trying it.
You selected usage of the tail utility as your preferred answer, but that's likely to be more of a headache on Windows than File::Tail.