Question

I try to hash path to the file with it filename by md5 algorithm, of course the filename always different in system. I'm interesting in is this hash always will be different for different files or hash can repeat?

Сan be such that I get same hash from different files? Are there any restrictions? Thanks

Was it helpful?

Solution 2

Hash collision is a major issue today in cryptography and general computer science. While md5 is a widely known and used hash, collisions can be very prevalent in it. Collisions are unlikely to happen, but they can occur. They typically will not occur unless someone is attempting to create them.

I submit here the issue that there are 340282366920938463463374607431768211456 possible md5 hashes (since they're displayed hexadecimal style, 16 possibly characters raised to the power of the 32 character length), but there are an infinite amount of strings which can be hashed (that of course taking computational limits out of the equation).

But what is a developer to do if it's possible to have collisions?

I was recently in a meeting with a new friend of mine who runs a business that, among other things, involves cryptography. He said something that I had never thought of before. As I lack the memory to recall word-for-word, it was something along these lines: "Sure, you can fool my md5, but try to fool both my md5 and my sha256." What he was saying is that as a developer we have an enormous amount of programming options and that we should take advantage of them. We have the md5, gost, the sha-family and the list could go on. Hash your string with both a sha256 and an md5 and you'll find the chance of collisions to be lowered tremendously. In fact it will likely be lowered to the point where your chance of collision is practically nothing.

An implementation of this:

<?php $salt = "my_secret_salt"; /* this should have numbers, spaces, letters, special characters, etc. */ $stringToHash = $theUsersCookieValues; $time = time(); $hash_1 = hash('md5', $time . $salt . $stringToHash); $hash_2 = hash('sha256', $time . $salt . $stringToHash); setcookie("time_created", $time); setcookie("user", $theUserCookieValues); setcookie("hash_1", $hash_1); setcookie("hash_2", $hash_2); ?> While this deals with cookies, not filenames, it is still a great way to implement this principle, in my humble opinion.

OTHER TIPS

It's very unlikely that you will get a hash collision on your file names, however it is possible so you may want to consider it as a potential source of bugs in your application (depending on how many strings you are going to be hashing).

You don't mention if there is any cryptographic reason for you hashing the filenames, if you do need the filenames to be securely encrypted you should use the php crypt() function instead (md5 hashing has not been considered secure for a long time http://www.kb.cert.org/vuls/id/836068)

Yes, you can potentially get hash collisions. See wikipedia articles on the pigeonhole principle and birthday paradox to understand why.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top