Computing the load factor in a hash table that coalesces duplicates?

https://stackoverflow.com//questions/20026970

21-12-2019
|

Question

For a project, I'm creating a hash table of strings. It uses separate chaining, and for each filled position in the table, a linked list is created. This linked list contains a node, which stores the string, as well as its frequency. So, when a string is inserted:

1.) If it matches the hash of another string, and the current string is NOT in the table, it will be appended to the list at this hash value, and will have a frequency of 1.

2.) If there is a copy of the string in the table already, the frequency of this string will be incremented.

How would I calculate the load factor for this table? Would it be the number of nodes over the total number of positions in the hash table (this doesn't include lists). Or, would it be the sum of the frequencies divided by the number of positions in the hash table? -Thanks!

Solution

The load factor is computed so that the table can resize itself if the number of elements in the table grows too large. High load factors mean that lookups may start to take a very long time, since (on average) more elements will have to be searched for.

In your case, if you're storing duplicates by tracking the frequency of each item, it doesn't make sense to include the duplicates in the load factor. After all, it takes the same amount of time to do a lookup for an item in a bucket where each item has frequency 10¹⁰⁰ as it does to look up items in a bucket where each item has frequency 1.

I would calculate the load factor as the number of unique items divided by the number of buckets, since this gives you the most accurate information about the expected time of a lookup.

Hope this helps!

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow