You could use GNU perf to generate a perfect hash function for your keys.
HashMap implementation optimization
-
21-07-2023 - |
문제
I have a hashmap containing about half a million entries the key is a string whose values comes as a combination of 5 different inputs. (string concatenation) the domain of each of the input is small but the combination of the 5 inputs gives this huge map (500K items). Now I am thinking of optimizing this structure.
My idea is to hash the input (the combination of 5 inputs) by hashing each individual input and combining that 5 hashes into one single hash (int 32 or 64) and then look up that hash.
My question is there a known data structure that can handle this situation well? and Is it worth doing that optimization? I wanna optimize both memory and run-time.
I am using C++
and std::unordered_map
the key is the combined string from the 5 inputs and the output is random. I didn't find any relation between the inputs and outputs (random or serial).
125 458 699 sadsadasd 5 => 56.
125 458 699 sadsadasd 3 => 57.
125 458 699 sadsadasd 4 => 58.
125 458 699 sadsadasd 5 => 25.
125 458 699 gsdfsds 3 => 89.
The domain of each of the inputs is small (the 4th input has 2K different values while the other inputs can have about only 20 different values).
해결책
다른 팁
It seems to me that there is no way to reduce the size of your keys that will result in reliable extraction. Hashing the 5 inputs into 1 integer is a one-way function that will prevent you from performing reliable lookups.
The way round this would be to keep a translation table, but that's actually more overhead because each distinct tuple of inputs would require the storage for 2 hashes and the tuple.
I think you're best off using a std::tuple<int, int, int, std::string, int>
as the key type in a single map.
If you use a std::map<tuple<>, data_type>
you won't need to provide a hashing function. If you stay with the unordered_map
, you'll need to provide one since std::tuple
has no default hash<>
specialisation.