Question

I have a perl script where I am writing out a very large log file. Currently I write out my file in the 'traditional' Perl way of doing it:

open FILE, ">", 'log.txt';
print FILE $line;
.....
close FILE;

I've heard a lot of good things about File::Slurp when reading in files, and how it can vastly improve runtimes. My question is, would using File::Slurp make writing out my log file any faster? I ask because writing out a file in perl seems pretty simple as it is, I don't know how File::Slurp could really optimize it anymore.

Was it helpful?

Solution

The File::Slurp utilities may, under certain circumstances, be fractionally faster overall than the equivalent streamed implementation, but file I/O is so very much slower than anything based solely on memory and CPU speed that it is almost always the limiting resource.

I have never heard any claims that File::Slurp can vastly improve runtimes and would appreciate seeing a reference to that effect. The only way I could see it being a more efficient solution is if the program requires random access to the files or has to read it multiple times. Because the data is all in memory at once there is no overhead to accessing any of the data, but in this case my preference would be for Tie::File which makes it appear as if the data is all available simultaneously with little speed impact and far less memory overhead

In fact it may well be that a call to read_file makes the process seem much slower to the user. If the file is significantly large then the time taken to read all of it and split it into lines may amount to a distinct delay before processing can start, whereas openeing a file and reading the first line will usually appear to be instantaneous

The same applies at the end of the program. A call to write_file, which combines the data into disk blocks and pages it out to the file, will take substantially longer than simply closing the file

In general the traditional streaming output method is preferable. It has little or no speed impact and avoids data loss by saving the data incrementally instead of waiting until a vast swathe of data has been accumulated in memory before discovering that it cannot be written to disk for one reason or another

My advice is that you reserve using File::Slurp for when you have small files to which random access could significantly simplify the program code. Even then there is nothing wrong with

my @data = do {
  open my $fh, '<', 'my_file' or die $!;
  <$fh>;
};

for input, or

open my $fh, '>', 'out_file' or die $!;
print { $fh } for @data;

for output. Particularly in your case, where you are dealing with a very large log file I think there is no question that you should stick to streamed output methods

OTHER TIPS

File::Slurp is mostly a convenience function. Instead of writing the usual open, while read/write, close code you only have the one lines read_file and write_file.

However, I don't know about it being any faster than your own code. It is coded in Perl, not in C. Also in case of using the array variant of write_file $file_name, @lines it might also be a bit inefficient regarding memory as it first joins all array lines into a single scalar before writing that out.

However, it does use syswrite instead of buffered writes. It can safely do that because it is the only function accessing the file handle during its life time. So yes, it might be faster due to that.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top