Frage

Anyone know if MD5, Whirlpool, SHA[n], etc., have any "special" input that might get a hexdigest output to align into:

  • All numeric characters
  • All alpha characters
  • All of the same character/pattern repeated consistently or entirely

Example in python:

>>> from hashlib import sha1
>>> hash = sha1('magic_word').hexdigest()
>>> hash
4040404040404040404040404040404040404040
>>> hash = sha1('^3&#b d   *#"').hexdigest()
aedefeebadcdccebefadcedddcbeadaedcbdeadc

Is this even possible? My knowledge of hashing functions is limited to the scope of applying them in databases for storing passwords, which is essentially none.

But sometimes I wonder, when testing for collisions, that these sorts of cases might arise...

War es hilfreich?

Lösung

A hash function models a random oracle: for each input, if it was not yet queried before, we throw some dice to find an output, then note it to some book. If an input is queried again, simply give back this old value.

By throwing a 16-sided dice 40 times (for each input), we get enough output for an SHA-1 like oracle. (For MD5, we only need 32 times.)

So, we can calculate the probability of "40 times only letters" as (6/16)^40 ≈ 9.15·10^-18, "40 times only digits" has probability (10/16)^40 ≈ 6.8·10^-9.

As "number of tries needed until the first success" is geometrically distributed, we need 1/p tries in average, i.e. around 10^17 tries for "only letters", and 1.5 ·10^8 tries for "only digits".

(Now, SHA-1 is not a real random oracle, but there is no weakness known which would say that SHA-1 would have better or worse probabilities for one of these. And for now, brute-force really seems to be the best way to do this.)

Andere Tipps

I'm sure with the right input, those sorts of outputs are possible. Why does it matter? Just curious?

Yes, it is possible. Given the right input, any desired bit pattern can be output. It might take a few million years to find the right input though.

For a reasonably wide target, like all hex 0-9 or all hex a-f it should be relatively easy. Calculating the proportion of acceptable outputs, in all possible outputs will help you get an estimate of the running time. Brute force or random searching will eventually find something that hits the target. For a broken hash, like MD4, you might be able to shave something off the expected time.

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top