Question

I know that a hash value(for example: md5 value) can have connection with multiple values like '^&#%we242eweqweqweqwedfdfdfee2', '%$#%3423efffe435%%^#'

But as most users are actually use a very simple password, are those md5 values can only have relationship with a limited simple cleartxt passwords?

I mean if 'cfcd208495d565ef66e7dff9f98764da' just have connection with 30 simple values like '0','tom123','goodcar', then a hacker who has get md5 data from a database would easily figure out the relationship between a username and its cleartext password, and then could use this pair of value to hack the same account on other websites.

So, is any specified md5 value only responsible for a limited simple values?

PS: I know I can add salt or use better method like sha512, sha3, but I'm very curious about the question above.

Was it helpful?

Solution

The question depends on what your understanding of "simple values" is. Generally speaking cryptographic hash function try to emulate a random mapping of arbitrary length inputs to fixed length outputs. The most fundamental security notion of those cryptographic hashes is so called collision resistance, i.e. it is computational infeasible to find a pair of input messages that hash to the same fixed length output. As you have demonstrated this notion is now broken with md5 as you can construct special messages that do indeed collide under md5.

But as you were talking about "simple values" I assume you exclude such artificially crafted messages and then we can still view md5 as a random mapping.

For such a random mapping the chance of a collision only depends on size of the input domain. For example if you are looking at all 6 character passwords out of the charset {a-z, A-Z, 0-9} you can be sure that there will be no collision (and you can even try it yourself as Chris has pointed out). But if you expand that size to 25 characters out of the same charset there is guaranteed to be a collision as there are now more possible passwords than available hash values.

Estimating the chance of a collision is called the birthday problem. As a simple estimated if you have k possible output values you can expect there to be a collision when you reach sqrt(k) input values. So for md5 with k=2^128 you expect a collision if your input value set approaches the size of 2^64.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top