Question

I saved this piece of code as hash.py and when I produce the hash of this file it gives me a hash totally differing from the inbuilt (using KUbuntu 13.04), Now why is that so ? Aren't they both supposed to produce the same result. I also have to mention that for calculating hash value of huge files (I tested on 4.5GB iso file) with the inbuilt md5sum it at least takes 7 seconds but this python file is almost instant

""" filename: hash.py """
import sys
import hashlib
file_name = sys.argv[0]
hash_obj = hashlib.md5(file_name)
print "MD5 - "+ hash_obj.hexdigest()

Output:

meow@VikkyHacks:~/Arena/py$ python hash.py 
MD5 - d18a4085140ad0c8ee7671d8ba2065fc

Output from the inbuilt default command:

meow@VikkyHacks:~/Arena/py$ md5sum hash.py 
5299f3588cb0de6cf27930181be73e80  hash.py
Was it helpful?

Solution 2

You are extracting the file path from sys.argv[0] and compute its md5 (that is, the md5 of the path as a string). To compute the md5 of the file contents, use:

import sys
import hashlib

file_path = sys.argv[0]
with open(file_path, 'rb') as file_handle:
    file_contents = file_handle.read()
    print('MD5 - ' + hashlib.md5(file_contents).hexdigest())

EDIT

Using hashlib.md5(open(file_name, 'rb').read()) is a bad practice because it does not close the file properly.

OTHER TIPS

In the first case you are hashing the file name, in the second you are hashing the file's contents.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top