En utilisant hashlib pour calculer md5 digest d'un fichier en Python 3

https://stackoverflow.com/questions/7829499

27-10-2019
|

Question

Avec Python 2.7 le code suivant calcule la hexdigest mD5 du contenu d'un fichier.

(EDIT: Eh bien, pas vraiment que les réponses ont montré, je pensais juste si)

import hashlib

def md5sum(filename):
    f = open(filename, mode='rb')
    d = hashlib.md5()
    for buf in f.read(128):
        d.update(buf)
    return d.hexdigest()

Maintenant, si je lance ce code à l'aide python3 il élever une exception TypeError:

    d.update(buf)
TypeError: object supporting the buffer API required

Je me suis dit que je pouvais faire cette course de code à la fois python2 et python3 changer à:

def md5sum(filename):
    f = open(filename, mode='r')
    d = hashlib.md5()
    for buf in f.read(128):
        d.update(buf.encode())
    return d.hexdigest()

Maintenant, je me demande encore pourquoi le code d'origine a cessé de fonctionner. Il semble que l'ouverture d'un fichier en utilisant le mode binaire modificateur, il retourne des entiers au lieu de chaînes codées en octets (je dis cela parce que les rendements de type int (buf)). Est-ce comportement explique quelque part?

La solution

I think you wanted the for-loop to make successive calls to f.read(128). That can be done using iter() and functools.partial():

import hashlib
from functools import partial

def md5sum(filename):
    with open(filename, mode='rb') as f:
        d = hashlib.md5()
        for buf in iter(partial(f.read, 128), b''):
            d.update(buf)
    return d.hexdigest()

print(md5sum('utils.py'))

Autres conseils

for buf in f.read(128):
  d.update(buf)

.. updates the hash sequentially with each of the first 128 bytes values of the file. Since iterating over a bytes produces int objects, you get the following calls which cause the error you encountered in Python3.

d.update(97)
d.update(98)
d.update(99)
d.update(100)

which is not what you want.

Instead, you want:

def md5sum(filename):
  with open(filename, mode='rb') as f:
    d = hashlib.md5()
    while True:
      buf = f.read(4096) # 128 is smaller than the typical filesystem block
      if not buf:
        break
      d.update(buf)
    return d.hexdigest()

I finally changed my code to the version below (that I find easy to understand) after asking the question. But I will probably change it to the version suggested by Raymond Hetting unsing functools.partial.

import hashlib

def chunks(filename, chunksize):
    f = open(filename, mode='rb')
    buf = "Let's go"
    while len(buf):
        buf = f.read(chunksize)
        yield buf

def md5sum(filename):
    d = hashlib.md5()
    for buf in chunks(filename, 128):
        d.update(buf)
    return d.hexdigest()

Licencié sous: CC-BY-SA avec attribution

Non affilié à StackOverflow