Domanda

So, I have a couple of system backup image files that are around 1 terabyte, and i want to calculate fast the hash of each one of them (preferably SHA-1).

At first i tried to calculate the md5 hash, 2 hours had passed and the hash hadn't been calculated yet (something that's obvious for large files up to 1TB).

So is there any program/implementation out there that can hash a 1TB file quickly?

I have heard of Tree-Hashing that hashes parts of file simultaneously, but I haven't found any implementation so far.

È stato utile?

Soluzione

If you have a 1 million MB file, and your system can read this file at 100MB/s, then

  • 1TB * 1000(TB/GB) = 1000 GB
  • 1000GB * 1000(MB/GB) = 1 million MB
  • 1 million MB/100(MB/s) = 10 thousand seconds
  • 10000s/3600(s/hr) = 2.77... hr
  • Therefore, a 100MB/s system has a hard floor of 2.77... hrs to even read the file in the first place, even before whatever additional total time may be required to compute a hash.

Your expectations are probably unrealistic - don't try to calculate a faster hash until you can perform a faster file read.

Altri suggerimenti

Old and already answered, but you may try to select specific chunks of file.

There is a perl solution i found somewhere and it that seems effective, code not mine:

#!/usr/bin/perl

use strict;
use Time::HiRes qw[ time ];
use Digest::MD5;

sub quickMD5 {
    my $fh = shift;
    my $md5 = new Digest::MD5->new;

    $md5->add( -s $fh );

    my $pos = 0;
    until( eof $fh ) {
        seek $fh, $pos, 0;
        read( $fh, my $block, 4096 ) or last;
        $md5->add( $block );
        $pos += 2048**2;
    }
    return $md5;
}

open FH, '<', $ARGV[0] or die $!;
printf "Processing $ARGV[0] : %u bytes\n", -s FH;

my $start = time;
my $qmd5 = quickMD5( *FH );
printf "Partial MD5 took %.6f seconds\n", time() - $start;
print "Partial MD5: ", $qmd5->hexdigest, "\n";

Basically the script perform MD5 on first 4KB for every 4MB block in file (actually original one did every 1MB).

I suggest you take a look at the non-cryptographic hashes (ex: xxhash and murmur3) they are much faster than md5 till of course you reach the max read speed.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top