How can I predetermine the ideal size I should make a hash table for N items?

Question

While this depends on your hashing function + data type (to determine how the hashing takes place), the size of your hash entries (which with python can vary from 32-bit systems vs. 64-bit systems), your collision handling strategy, and your time/memory requirements, the following is a good rule of thumb:

Use a load factor of 2/3.

That is, have a hash table that is 3/2 the size of your number of elements. So if you have 1,000 elements, you have 1,500 entries. If each hash element is 32 bits (assumption based on a 32-bit python installation, if this is wrong someone correct me) then you are wasting almost 2 kB, which is truly a tiny amount of memory. If you had 2,000,000 entries, you would waste almost 4 MB which also is tiny.

In short, the common consideration in hash tables is rarely space, but rather time. Python's own implementation used with dicts uses a 2/3 load factor maximum before expanding the hash table size. This is based upon a performance degradation with many collision strategies performing poorly around 70% or 80% load factor.