Pergunta

I am to write a chained hash set class in java.

I understand the load factor is M/capacity where M is the number of elements currently in the table and capacity is the size of the table.

But how does the load factor help me determine if I should resize the table and rehash or not? Also I couldn't find anywhere how to calculate the lower / upper load factors. Are they even needed?

I hope this is enough information, Thank you!!

Foi útil?

Solução

A single loadFactor used to configure standard Java hashes (and in a number of hash APIs in other languages) is a simplification.

Conceptually, it is reasonable to distingush

  • Target load, indicate a default memory footprint -- performance tradeoff configuration. when you build a hash of the known size, you choose capacity so that size/capacity is as close to target load, as possible.

  • Max load, you want a hash to never exceed this load. You trigger a resize, if a hash reach this load.

  • Grow factor, a default configuration of how much to enlarge a hash on resize. If capacity is a power of 2, grow factor could be only either 2 or 4.

  • Min load, you want a hash load to never be lower than the min load, maybe unless you remove elements or clear a hash. If capacity is a power of 2, min load couldn't be greater than 0.5. Additionally, max load / min load ratio should be greater or equal to grow factor.

All the above concerns chained hash, for open addressing hash with tombstones things are getting even more complicated.

In java.util.HashMap loadFactor plays the roles of target and max load simultaneously. Grow factor is 2, min load is 0.0.

For chained hash non-power-of-2 capacity is overkill unless you need extremely precise control over memory usage (you don't, trust me) or a capacity between 2^30 and 2^31-1 (because you can't create an array of size 2^31 in Java, it is Integer.MAX_VALUE + 1).

Outras dicas

It works the other way around: it's not that the load factor helps you; you explicitely decide the load factor based on your performance tests, to avoid wasting time rehashing and still have the acceptable retrieval and iteration performance.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top