Question

Is there any linux command line implementation that performs exceptionally well for generating sha1's on large files (< 2GB)?

I have played around with 'openssl sha1' and it takes minutes to get the sha1 for a 2GB file : /.

Was it helpful?

Solution

I don't think that a SHA algorithm could be optimized for size, since it operates on blocks of a fixed size, and the computation cannot be done in parallel. It seems that the fastest implementation on a small file will also be the fastest on a large file.

OTHER TIPS

On my machine, for a file of 1GB, with enough memory to have the entire file cached in memory after the first run:

sha1sum: 3.92s
openssl sha1: 3.48s
python hashlib.sha1: 3.22s

it takes minutes to get the sha1 for a 2GB file

There's something wrong there then, unless you're using incredibly slow old hardware. Even on the first run, where the file was being read directly from disc, it was only taking ‘openssl sha1’ about 20s per gig on my machine. Are you having slow I/O problems in general?

Your problem is likely disk I/O. A basic SHA1 implementation on an old 2.0GHz Core Duo processor can process /dev/zero at 100MiB/s - faster than most hard drives typically paired with such a system.

Show us the speeds you're currently seeing (and on what spec hardware).

If you don't specifically need the sha1 but are looking to checksum large files, Sha512sum would be faster for large files as it reads in larger size blocks

sha1sum is what I'd use for computing SHA-1 checksums... it's designed to do exactly one thing so I would hope it does it as fast as practically possible. I don't have any 2GB files to benchmark it on though :-(

EDIT: After some tests on an ISO image it looks like the limiting factor on my system is disk I/O speed - not surprising, although I feel kind of silly for not thinking of that earlier. Once that's corrected for, it seems like openssl is about twice as fast as sha1sum...

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top