Question

I've got multiple classes that, for certain reasons, do not follow the official Equals contract. In the overwritten GetHashCode() these classes simply return 0 so they can be used in a Hashmap.

Some of these classes implement the same interface and there are Hashmaps using this interface as key. So I figured that every class should at least return a different (but still constant) value in GetHashCode().

The question is how to select this value. Should I simply let the first class return 1, the next class 2 and so on? Or should I try something like

class SomeClass : SomeInterface {
    public overwrite int GetHashCode() {
        return "SomeClass".GetHashCode();
    }
}

so the hash is distributed more evenly? (Do I have to cache the returned value myself or is Microsoft's compiler able to optimize this?)

Update: It is not possible to return an individual hashcode for each object, because Equals violates the contract. Specifially, I'm refering to this problem.

Was it helpful?

Solution

I ran into this exact problem when writing a vector class. I wanted to compare vectors for equality, but float operations give rounding errors, so I wanted approximate equality. Long story short, overriding equals is a bad idea unless your implementation is symmetric, reflexive, and transitive.

Other classes are going to assume equals has those properties, and so will classes using those classes, and so you can end up in weird cases. For example a list might enforce uniqueness, but end up with two elements which evaluate as equal to some element B.

A hash table is the perfect example of unpredictable behavior when you break equality. For example:

//Assume a == b, b == c, but a != c
var T = new Dictionary<YourType, int>()
T[a] = 0
T[c] = 1
return T[b] //0 or 1? who knows!

Another example would be a Set:

//Assume a == b, b == c, but a != c
var T = new HashSet<YourType>()
T.Add(a)
T.Add(c)
if (T.contains(b)) then T.remove(b)
//surely T can't contain b anymore! I sure hope no one breaks the properties of equality!
if (T.contains(b)) then throw new Exception()

I suggest using another method, with a name like ApproxEquals. You might also consider overriding the == operator, because it isn't virtual and therefore won't be used accidentally by other classes like Equals could be.

If you really can't use reference equality for the hash table, don't ruin the performance of cases where you can. Add an IApproxEquals interface, implement it in your class, and add an extension method GetApprox to Dictionary which enumerates the keys looking for an approximately equal one, and returns the associated value. You could also write a custom dictionary especially for 3-dimensional vectors, or whatever you need.

OTHER TIPS

If it "violates the Equals contract", then I'm not sure you should be using it as a key.

It something is using that as a key, you really need to get the hashing right... it is very unclear what the Equals logic is, but two values that are considered equal must have the same hash-code. It is not required that two values with the same hash-code are equal.

Using a constant string won't really help much - you'll get the values split evenly over the types, but that is about it...

I'm curious what the reasoning would be for overriding GetHashCode() and returning a constant value. Why violate the idea of a hash rather than just violating the "contract" and not overriding the GetHashCode() function at all and leave the default implementation from Object?

Edit

If what you've done is that so you can have your objects match based on their contents rather than their reference then what you propose with having different classes simply use different constants can WORK, but is highly inefficient. What you want to do is come up with a hashing algorithm that can take the contents of your class and produce a value that balances speed with even distribution (that's hashing 101).

I guess I'm not sure what you're looking for...there isn't a "good" scheme for choosing constant numbers for this paradigm. One is not any better than the other. Try to improve your objects so that you're creating a real hash.

When hash collisions occur, the HashTable/Dictionary calls Equals to find the key you're looking for. Using a constant hash code removes the speed advantages of using a hash in the first place - it becomes a linear search.

You're saying the Equals method hasn't been implemented according to the contract. What exactly do you mean with this? Depending on the kind of violation, the HashTable or Dictionary will merely be slow (linear search) or not work at all.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top