Question

I create a GUID (as a string) and get the hash of it. Can I consider this hash to be unique?

Was it helpful?

Solution

Not as reliably unique as the GUID itself, no.

Just to expand, you are reducing your uniqueness by a factor of 4, going from 16 bytes to 4 bytes of possible combinations.

As pointed out in the comments the hash size will make a difference. The 4 byte thing was an assumption, horrible at best I know, that it may be used in .NET, where the default hash size is 4 bytes (int). So you can replace what I said above with whatever byte size your hash may be.

OTHER TIPS

In a word, no.

Let's assume that your hash has fewer bits than the GUID, by the pigeon hole principle, there must exist more than one mapping of some GUID -> hash simply because there are fewer hashes than GUIDS.

If we assume that the hash has a larger number of bits than the GUID, there is a very small--but finite--chance of a collision, assuming you're using a good hash function.

No hash function that reduces an arbitrary sized data block to a fixed size number of bits will produce a 1-to-1 mapping between the two. There will always exist a chance of having two different data blocks be reduced to the same sequence of bits in the hash.

Good hash algorithms minimizes the likelihood of this happening, and generally, the more bits in the hash, the less chance of a collision.

It's not guranteed to be, due to hash collisions. The GUID itself is almost-guaranteed to be.

For practical reasons you probably can assume that a hash is unique, but why not use the GUID itself?

No, and I wouldn't assume uniqueness of any hash value. That shouldn't matter because hash values don't need to unique, they just need to evenly distributed across their range. The more even the distribution, the fewer collisions occur (in the hashtable). Fewer collisions mean better hashtable performance.

fyi For a good description of how hash tables work, read the accepted answer to What are hashtables and hashmaps and their typical use cases?

If you use cryptographic hash (MD5, SHA1, RIPEMD160), the hash will be unique (modulo collisions which are very improbable -- SHA1 is used e.g. for digital signatures, and MD5 is also collision-resistant on random inputs). Though, why do you want to hash a GUID?

I would like to hash a GUID to X size with the realization that sometimes I have 10 or less GUIDS in set so I might get away with a shorter hash without collision than if I have 10,000,000 GUID in a set. I would just like to be able to specify the size of the hash when I call the function.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top