I'm not sure how you came up with
key %= 19
key %= 7
But I find that very suspicious, and possibly wrong.
The basic idea behind using prime numbers is a good one, but %
is the modulo operation, thus key
after the first line will be in the range [0,18]
, and [0,6]
after the second line, thus there are only 7
possible hash values.
This would be a better way to do it:
for(i = 0; i < len; i++)
key = 19*key + name[i];
key % tableSize;
Simply multiplying each value by 10 (as you did) isn't ideal, as abc
and cab
would have the same hash value - you want to have the weight them differently, i.e. multiply the first value by 1, the second by 10, the third by 100, etc. - using prime numbers here helps to minimize conflicts.
Making tableSize
a prime number could also help to minimize conflicts.
Taking the modulo of division by a prime number twice doesn't help.
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 ...
% 19 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 0 1 2 3 4 5 6 ...
% 7 0 1 2 3 4 5 6 0 1 2 3 4 5 6 0 1 2 3 4 0 1 2 3 4 5 6 ...
See what happened there at 19
- we skipped 5
and 6
- this isn't good - that means that 0-4
's are more likely to occur, so that instantly increases the chance of conflicts for those.
You'll need to rehash everything anyway (because if the file size changes, the hash values may change, and entries offset by collisions may no longer be offset), so it doesn't really matter whether you use the same file or another one - you'll still need to read everything, and write everything.
Keep in mind that using the same file adds complexity - you'll need to have an in-memory structure to store all the data before you wipe the file and start writing to it again. With the separate files, you could probably use your code with just minor changes.
If the entire file can't fit into memory (which is one reason why you'd choose to use files rather than memory structures), you don't really have another choice than using different files, as the intermediate in-memory structure won't be an option.
Once done with the old file, you can just delete it.