Question

I am trying to create a checksum of two files to compare them. This is the script I am using:

import hashlib
import datetime
f = open('myfile.mov', 'rb')
def checkF(f, block_size=2**20):
...     print datetime.datetime.now()
...     h = hashlib.sha1()
...     while True:
...             data = f.read(block_size)
...             if not data:
...                     break
...             h.update(data)
...     print datetime.datetime.now()
...     return h.digest()
... 
>>> checkF(f)
2012-03-21 09:33:40.704032
2012-03-21 09:33:40.704490
'\xda9\xa3\xee^kK\r2U\xbf\xef\x95`\x18\x90\xaf\xd8\x07\t'

Firstly, I'm not familiar with the output. Is this the string I can use to compare to the other file? Secondly, running this script on the same file a second time gives a different result. It seems to be related to how much time has passed between passes. I don't fully understand what's happening here. Any help would be great.

Was it helpful?

Solution

You have to reopen the file every time you call checkF, or reset the position of the file pointer with f.seek(0). That's why you get different hashsums: The first one is the hash of the file contents, and all latter ones are hashsums of the empty string (i.e. da39a3ee5e6b4b0d3255bfef95601890afd80709).

To get the hashsum as a hexadecimal string (for human consumption), simply call h.hexdigest() instead of h.digest(), which returns the hashsum as a bytestring (more compact, but not human-readable).

OTHER TIPS

>>> '\xda9\xa3\xee^kK\r2U\xbf\xef\x95`\x18\x90\xaf\xd8\x07\t'.encode('hex')
'da39a3ee5e6b4b0d3255bfef95601890afd80709'

But you probably want to just use hexdigest() instead

You forgot to close file with f.close() Please put this after calling checkF(f), sometimes python returns unpredictable results if you did'n close file at the end of your program.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top