Is there a significant overhead by using different versions of sha hashing (hashlib module)
Question
The hashlib
Python module provides the following hash algorithms constructors: md5()
, sha1()
, sha224()
, sha256()
, sha384()
, and sha512()
.
Assuming I don't want to use md5, is there a big difference in using, say, sha1 instead of sha512? I want to use something like hashlib.shaXXX(hashString).hexdigest()
, but as it's just for caching, I'm not sure I need the (eventual) extra overhead of 512...
Does this overhead exist, and if so, how big is it?
Solution
Why not just benchmark it?
>>> def sha1(s):
... return hashlib.sha1(s).hexdigest()
...
>>> def sha512(s):
... return hashlib.sha512(s).hexdigest()
...
>>> t1 = timeit.Timer("sha1('asdf' * 100)", "from __main__ import sha1")
>>> t512 = timeit.Timer("sha512('asdf' * 100)", "from __main__ import sha512")
>>> t1.timeit()
3.2463729381561279
>>> t512.timeit()
6.5079669952392578
So on my machine, hash512
is twice as slow as sha1
. But as GregS said, why would you use secure hash for caching? Try the builtin hash algorithms which should be really fast and tuned:
>>> s = "asdf"
>>> hash(s)
-618826466
>>> s = "xxx"
>>> hash(s)
943435
>>> hash("xxx")
943435
Or better yet, use the builtin Python dictionaries. Maybe you can tell us more about what you plan on caching.
EDIT: I'm thinking that you are trying to achieve something like this:
hash = hashlib.sha1(object_to_cache_as_string).hexdigest()
cache[hash] = object_to_cache
What I was refferring to by "use the builtin Python dictinoaries" is that you can simplify the above:
cache[object_to_cache_as_string] = object_to_cache
In this way, Python takes care of the hashing so you don't have to!
Regarding your particular problem, you could refer to Python hashable dicts in order to make a dictionary hashable. Then, all you'd need to do to cache the object is:
cache[object_to_cache] = object_to_cache
OTHER TIPS
Perhaps a naive test... but it looks like it depends on how much you're hashing. 2 blocks of sha512 is faster than 4 blocks of sha256?
>>> import timeit
>>> import hashlib
>>> for sha in [ x for x in dir(hashlib) if x.startswith('sha') ]:
... t = timeit.Timer("hashlib.%s(data).hexdigest()" % sha,"import hashlib; data=open('/dev/urandom','r').read(1024)")
... print sha + "\t" + repr(t.timeit(1000))
...
sha1 0.0084478855133056641
sha224 0.034898042678833008
sha256 0.034902095794677734
sha384 0.01980900764465332
sha512 0.019846916198730469