Pregunta

Sorry to combine two questions into one, they are related.

HashCodes for HashSets and the such. As I understand it, they must be unique, not change, and represent any configuration of an object as a single number.

My first question is that for my object, containing the two Int16s a and b, is it safe for my GetHashCode to return something like a * n + b where n is a large number, I think perhaps Math.Pow(2, 16)?

Also GetHashCode appears to inflexibly return specifically the type Int32.

32bits can just about store, for example, two Int16s, a single unicode character or 16 N, S, E, W compass directions, it's not much, even something like a small few node graph would probably be too much for it. Does this represent a limit of C# Hash collections?

¿Fue útil?

Solución

As I understand it, they must be unique

Nope. They can't possibly be unique for most types, which can have more than 232 possible values. Ideally, if two objects have the same hash code then they're unlikely to be equal - but you should never assume that they are equal. The important point is that if they have different hash codes, they should definitely be unequal.

My first question is that for my object, containing the two Int16s a and b, is it safe for my GetHashCode to return something like a * n + b where n is a large number, I think perhaps Math.Pow(2, 16).

If it only contains two Int16 values, it would be simplest to use:

return (a << 16) | (ushort) b;

Then the value will be unique. Hoorah!

Also GetHashCode appears to inflexibly return specifically the type Int32.

Yes. Types such as Dictionary and HashSet need to be able to use the fixed size so they can work with it to put values into buckets.

32bits can just about store, for example, two Int16s, a single unicode character or 16 N, S, E, W compass directions, it's not much, even something like a small few node graph would probably be too much for it. Does this represent a limit of C# Hash collections?

If it were a limitation, it would be a .NET limitation rather than a C# limitation - but no, it's just a misunderstanding of what hash codes are meant to represent.

Eric Lippert has an excellent (obviously) blog post about GetHashCode which you should read for more information.

Otros consejos

GetHashCode is not (and cannot be) unique for every instance of an object. Take Int64, for example; even if the hash function is perfectly distributed, there will be two four billion Int64s that hash to every value, since the hash code is, as you mentioned, only an Int32.

However this is not a limitation on collections using hash codes; they are simply use buckets for elements which hash to the same value. So a lookup into a hash table isn't guaranteed to be a single operation. Getting the correct bucket is a single operation, but there may be multiple items in that bucket.

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top