C# GetHashCode with two Int16, also returns only up to Int32?
-
31-05-2021 - |
Pregunta
Sorry to combine two questions into one, they are related.
HashCode
s for HashSet
s and the such. As I understand it, they must be unique, not change, and represent any configuration of an object as a single number.
My first question is that for my object, containing the two Int16s a
and b
, is it safe for my GetHashCode
to return something like a * n + b
where n is a large number, I think perhaps Math.Pow(2, 16)
?
Also GetHashCode
appears to inflexibly return specifically the type Int32.
32bits can just about store, for example, two Int16s, a single unicode character or 16 N, S, E, W compass directions, it's not much, even something like a small few node graph would probably be too much for it. Does this represent a limit of C# Hash collections?
Solución
As I understand it, they must be unique
Nope. They can't possibly be unique for most types, which can have more than 232 possible values. Ideally, if two objects have the same hash code then they're unlikely to be equal - but you should never assume that they are equal. The important point is that if they have different hash codes, they should definitely be unequal.
My first question is that for my object, containing the two Int16s a and b, is it safe for my GetHashCode to return something like a * n + b where n is a large number, I think perhaps Math.Pow(2, 16).
If it only contains two Int16
values, it would be simplest to use:
return (a << 16) | (ushort) b;
Then the value will be unique. Hoorah!
Also
GetHashCode
appears to inflexibly return specifically the typeInt32
.
Yes. Types such as Dictionary
and HashSet
need to be able to use the fixed size so they can work with it to put values into buckets.
32bits can just about store, for example, two Int16s, a single unicode character or 16 N, S, E, W compass directions, it's not much, even something like a small few node graph would probably be too much for it. Does this represent a limit of C# Hash collections?
If it were a limitation, it would be a .NET limitation rather than a C# limitation - but no, it's just a misunderstanding of what hash codes are meant to represent.
Eric Lippert has an excellent (obviously) blog post about GetHashCode
which you should read for more information.
Otros consejos
GetHashCode
is not (and cannot be) unique for every instance of an object. Take Int64
, for example; even if the hash function is perfectly distributed, there will be two four billion Int64
s that hash to every value, since the hash code is, as you mentioned, only an Int32
.
However this is not a limitation on collections using hash codes; they are simply use buckets for elements which hash to the same value. So a lookup into a hash table isn't guaranteed to be a single operation. Getting the correct bucket is a single operation, but there may be multiple items in that bucket.