I think what you're really trying to ask is:
"Why does a HashSet add objects with inequal hash codes even if they claim to be equal?"
The distinction between my question and the question you posted is that you're assuming this behavior is a bug, and therefore you're getting grief for coming at it from that perspective. I think the other posters have done a thoroughly sufficient job of explaining why this is not a bug, however they have not addressed the underlying question.
I will try to do so here; I would suggest rephrasing your question to remove the accusations of poor documentation / bugs in Java so you can more directly explore why you're running into the behavior you're seeing.
The equals()
documentations states (emphasis added):
Note that it is generally necessary to override the
hashCode
method whenever this method is overridden, so as to maintain the general contract for thehashCode
method, which states that equal objects must have equal hash codes.
The contract between equals()
and hashCode()
isn't just an annoying quirk in the Java specification. It provides some very valuable benefits in terms of algorithm optimization. By being able to assume that a.equals(b)
implies a.hashCode() == b.hashCode()
we can do some basic equivalence tests without needing to call equals()
directly. In particular, the invariant above can be turned around - a.hashCode() != b.hashCode()
implies a.equals(b)
will be false.
If you look at the code for HashMap
(which HashSet
uses internally), you'll notice an inner static class Entry
, defined like so:
static class Entry<K,V> implements Map.Entry<K,V> {
final K key;
V value;
Entry<K,V> next;
int hash;
...
}
HashMap
stores the key's hash code along with the key and value. Because a hash code is expected to not change over the time a key is stored in the map (see Map
's documentation, "The behavior of a map is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is a key in the map.") it is safe for HashMap
to cache this value. By doing so, it only needs to call hashCode()
once for each key in the map, as opposed to every time the key is inspected.
Now lets look at the implementation of put()
, where we see these cached hashes being taken advantage of, along with the invariant above:
public V put(K key, V value) {
...
int hash = hash(key);
int i = indexFor(hash, table.length);
for (Entry<K,V> e = table[i]; e != null; e = e.next) {
Object k;
if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
// Replace existing element and return
}
}
// Insert new element
}
In particular, notice that the conditional only ever calls key.equals(k)
if the hash codes are equal and the key isn't the exact same object, due to short-circuit evaluation. By the contract of these methods, it should be safe for HashMap
to skip this call. If your objects are incorrectly implemented, these assumptions being made by HashMap
are no longer true, and you will get back unusable results, including "duplicates" in your set.
Note that your claim "HashSet ... has an add(Object o)
method, which is not inherited from another class" is not quite correct. While its parent class, AbstractSet
, does not implement this method, the parent interface, Set
, does specify the method's contract. The Set
interface is not concerned with hashes, only equality, therefore it specifies the behavior of this method in terms of equality with (e==null ? e2==null : e.equals(e2))
. As long as you follow the contracts, HashSet
works as documented, but avoids actually doing wasteful work whenever possible. As soon as you break the rules however, HashSet
cannot be expected to behave in any useful way.
Consider also that if you attempted to store objects in a TreeSet
with an incorrectly implemented Comparator
, you would similarly see nonsensical results. I documented some examples of how a TreeSet
behaves when using an untrustworthy Comparator
in another question: how to implement a comparator for StringBuffer class in Java for use in TreeSet?