python checksum verification of two large files

https://stackoverflow.com/questions/9796981

25-05-2021
|

题

I am trying to create a checksum of two files to compare them. This is the script I am using:

import hashlib
import datetime
f = open('myfile.mov', 'rb')
def checkF(f, block_size=2**20):
...     print datetime.datetime.now()
...     h = hashlib.sha1()
...     while True:
...             data = f.read(block_size)
...             if not data:
...                     break
...             h.update(data)
...     print datetime.datetime.now()
...     return h.digest()
... 
>>> checkF(f)
2012-03-21 09:33:40.704032
2012-03-21 09:33:40.704490
'\xda9\xa3\xee^kK\r2U\xbf\xef\x95`\x18\x90\xaf\xd8\x07\t'

Firstly, I'm not familiar with the output. Is this the string I can use to compare to the other file? Secondly, running this script on the same file a second time gives a different result. It seems to be related to how much time has passed between passes. I don't fully understand what's happening here. Any help would be great.

解决方案

You have to reopen the file every time you call checkF, or reset the position of the file pointer with f.seek(0). That's why you get different hashsums: The first one is the hash of the file contents, and all latter ones are hashsums of the empty string (i.e. da39a3ee5e6b4b0d3255bfef95601890afd80709).

To get the hashsum as a hexadecimal string (for human consumption), simply call h.hexdigest() instead of h.digest(), which returns the hashsum as a bytestring (more compact, but not human-readable).

其他提示

>>> '\xda9\xa3\xee^kK\r2U\xbf\xef\x95`\x18\x90\xaf\xd8\x07\t'.encode('hex')
'da39a3ee5e6b4b0d3255bfef95601890afd80709'

But you probably want to just use hexdigest() instead

You forgot to close file with f.close() Please put this after calling checkF(f), sometimes python returns unpredictable results if you did'n close file at the end of your program.

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow