Question

I need to calculate a SHA-256 hash of a large file (or portion of it). My implementation works fine, but its much slower than the C++'s CryptoPP calculation (25 Min. vs. 10 Min for ~30GB file). What I need is a similar execution time in C++ and Java, so the hashes are ready at almost the same time. I also tried the Bouncy Castle implementation, but it gave me the same result. Here is how I calculate the hash:

int buff = 16384;
try {
    RandomAccessFile file = new RandomAccessFile("T:\\someLargeFile.m2v", "r");

    long startTime = System.nanoTime();
    MessageDigest hashSum = MessageDigest.getInstance("SHA-256");

    byte[] buffer = new byte[buff];
    byte[] partialHash = null;

    long read = 0;

    // calculate the hash of the hole file for the test
    long offset = file.length();
    int unitsize;
    while (read < offset) {
        unitsize = (int) (((offset - read) >= buff) ? buff : (offset - read));
        file.read(buffer, 0, unitsize);

        hashSum.update(buffer, 0, unitsize);

        read += unitsize;
    }

    file.close();
    partialHash = new byte[hashSum.getDigestLength()];
    partialHash = hashSum.digest();

    long endTime = System.nanoTime();

    System.out.println(endTime - startTime);

} catch (FileNotFoundException e) {
    e.printStackTrace();
}
Was it helpful?

Solution

My explanation may not solve your problem since it depends a lot on your actual runtime environment, but when I run your code on my system, the throughput is limited by disk I/O and not the hash calculation. The problem is not solved by switching to NIO, but is simply caused by the fact that you're reading the file in very small pieces (16kB). Increasing the buffer size (buff) on my system to 1MB instead of 16kB more than doubles the throughput, but with >50MB/s, I am still limited by disk speed and not able to fully load a single CPU core.

BTW: You can simplify your implementation a lot by wrapping a DigestInputStream around a FileInputStream, read through the file and get the calculated hash from the DigestInputStream instead of manually shuffling the data from a RandomAccessFile to the MessageDigest as in your code.


I did a few performance tests with older Java versions and there seem to be a relevant difference between Java 5 and Java 6 here. I'm not sure though if the SHA implementation is optimized or if the VM is executing the code much faster. The throughputs I get with the different Java versions (1MB buffer) are:

  • Sun JDK 1.5.0_15 (client): 28MB/s, limited by CPU
  • Sun JDK 1.5.0_15 (server): 45MB/s, limited by CPU
  • Sun JDK 1.6.0_16 (client): 42MB/s, limited by CPU
  • Sun JDK 1.6.0_16 (server): 52MB/s, limited by disk I/O (85-90% CPU load)

I was a little bit curious on the impact of the assembler part in the CryptoPP SHA implementation, as the benchmarks results indicate that the SHA-256 algorithm only requires 15.8 CPU cycles/byte on an Opteron. I was unfortunately not able to build CryptoPP with gcc on cygwin (the build succeeded, but the generated exe failed immediately), but building a performance benchmark with VS2005 (default release configuration) with and without assembler support in CryptoPP and comparing to the Java SHA implementation on an in-memory buffer, leaving out any disk I/O, I get the following results on a 2.5GHz Phenom:

  • Sun JDK1.6.0_13 (server): 26.2 cycles/byte
  • CryptoPP (C++ only): 21.8 cycles/byte
  • CryptoPP (assembler): 13.3 cycles/byte

Both benchmarks compute the SHA hash of a 4GB empty byte array, iterating over it in chunks of 1MB, which are passed to MessageDigest#update (Java) or CryptoPP's SHA256.Update function (C++).

I was able to build and benchmark CryptoPP with gcc 4.4.1 (-O3) in a virtual machine running Linux and got only appr. half the throughput compared to the results from the VS exe. I am not sure how much of the difference is contributed to the virtual machine and how much is caused by VS usually producing better code than gcc, but I have no way to get any more exact results from gcc right now.

OTHER TIPS

Perhaps the first thing today is work out where you are spending the most time? Can you run it through a profiler and see where the most time is being spent.

Possible improvements:

  1. Use NIO to read the file in the fastest possible way
  2. Update the Hash in a separate thread. This is actually rather hard to do and isn't for the faint hearted as it involves safe publishing between threads. But if your profiling shows a significant amount of time being spent in hash algorithm it may make better use of the disk.

I suggest you use a profiler like JProfiler or the one integrated in Netbeans (free) to find out, where the time is actually spent and concentrate on that part.

Just a wild guess - not sure if it will help - but have you tried the Server VM? Try starting the app with java -server and see if that helps you. The server VM is more aggressive compiling Java code to native than the default client VM is.

It used to be that Java ran about 10x slower than the same C++ code. Nowadays is closer to 2x slower. I think what your running into is just a fundamental part of Java. JVMs will get faster, especially as new JIT techniques are discovered, but you'll have a hard time out performing C.

Have you tried alternative JVMs and/or compilers? I used to get better performance with JRocket, but less stability. Ditto for using jikes over javac.

Since you apparently have a working C++ implementation which is fast, you could build a JNI bridge and use the actual C++ implementation or maybe you could try not reinventing the wheel, especially since it's a big one and use a premade library such as BouncyCastle which has been made to solve all cryptographic needs of your program.

I think this difference in performance might only be platform related. Try changing the buffer size and see if there are any improvements. If not, I would go with JNI (Java Native Interface). Just call the C++ implementation from Java.

The MAIN reason why your code is so slow is because you use a RandomAccessFile which always has been quite slow performance-wise. I suggest using a "BufferedInputStream" so that you may benefit from all the power of the OS-level caching for disk-i/o.

The code should look something like:

    public static byte [] hash(MessageDigest digest, BufferedInputStream in, int bufferSize) throws IOException {
    byte [] buffer = new byte[bufferSize];
    int sizeRead = -1;
    while ((sizeRead = in.read(buffer)) != -1) {
        digest.update(buffer, 0, sizeRead);
    }
    in.close();

    byte [] hash = null;
    hash = new byte[digest.getDigestLength()];
    hash = digest.digest();
    return hash;
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top