Question

I want to calculate MD5 (or other) file hashes (RFC 1321 conform) within MATLAB using the Java-Security-Implementations. Thus I coded

mddigest=java.security.MessageDigest.getInstance('MD5');
filestream=java.io.FileInputStream(java.io.File(filename));
digestream=java.security.DigestInputStream(filestream,mddigest);
md5hash=reshape(dec2hex(typecast(mddigest.digest,'uint8')),1,[])

and the routine is working fine. Somehow, the result differs from given tools.
Maybe there are problems with the file-encoding? Should't MATLAB solve that internally?
I'd like to reproduce the results, one gets by md5sum (on linux), which equal those from HashCalc (Windows).

Was it helpful?

Solution

There is two problems:

  1. You don't read the file.
  2. You have to transpose the matrix before reshape it.

This code works:

mddigest   = java.security.MessageDigest.getInstance('MD5'); 
filestream = java.io.FileInputStream(java.io.File(filename)); 
digestream = java.security.DigestInputStream(filestream,mddigest);

while(digestream.read() ~= -1) end

md5hash=reshape(dec2hex(typecast(mddigest.digest(),'uint8'))',1,[]);

/!\ Edit : p.vitzliputzli answered a very much faster solution which should be used instead of this one.

OTHER TIPS

Stephane's solution works but is quite slow due to MATLAB's limitation of not being able to supply a JAVA byte[] array to the read method of the DigestInputStream (or any other InputStream).

However, we can adapt Thomas Pornin's solution (discarding the FileInputStream) in order to arrive at:

mddigest   = java.security.MessageDigest.getInstance('MD5'); 

bufsize = 8192;

fid = fopen(filename);

while ~feof(fid)
    [currData,len] = fread(fid, bufsize, '*uint8');       
    if ~isempty(currData)
        mddigest.update(currData, 0, len);
    end
end

fclose(fid);

hash = reshape(dec2hex(typecast(mddigest.digest(),'uint8'))',1,[]);

This solution takes about 0.018s to compute the hash of a 713kB file whereas the other solution takes about 31s.

You never read from the DigestInputStream.

This means no bytes will be digested.

You must read the entire file (via the DigestInputStream) and then call digest to get the digest value.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top