Pergunta

I've always loved trees, that nice O(n*log(n)) and the tidiness of them. However, every software engineer I've ever known has asked me pointedly why I would use a TreeSet. From a CS background, I don't think it matters all that much which you use, and I don't care to mess around with hash functions and buckets (in the case of Java).

In which cases should I use a HashSet over a TreeSet?

Foi útil?

Solução

From a CS background, I don't think it matters all that much which you use,

There are at least four significiant differences from a CS background:

  • A hash table lookup is O(1) rather than O(log(n))
  • A hash table requires a hash function, a tree requires a comparison function.
  • A tree provides sorted iteration, a hash table doesn't even have a well defined iteration order.
  • The peformance of a tree lookup degenerates to O(n) when it's imbalanced, but implementations can avoid this completely by being auto-rebalancing. Hash table lookup performance degenerates to O(n) when there are too many collisions, and avoiding this completely is very hard (Java's implementation manages this since Java 8, but only if the keys are also comparable - it switches to using a tree internally).

and I don't care to mess around with hash functions and buckets (in the case of Java).

You don't have to worry about buckets at all, the implementation class hides that from you completely. You do need a decent hashCode() implementation (an an equals() consistent with it), but that is not hard to come by.

In which cases should I use a HashSet over a TreeSet?

When you cannot easily define a comparison function, or when performance is essential - O(1), remember? There is also an implementation quirk: HashMap allows you to use null as a key value, TreeMap does not.

But nowadays the recommendation really is to always use a HashSet (for the superior performance) unless you have specific reasons not to (typically when you need the sorted iteration).

Licenciado em: CC-BY-SA com atribuição
scroll top