HashMap implementation optimization

https://stackoverflow.com/questions/23616351

21-07-2023
|

문제

I have a hashmap containing about half a million entries the key is a string whose values comes as a combination of 5 different inputs. (string concatenation) the domain of each of the input is small but the combination of the 5 inputs gives this huge map (500K items). Now I am thinking of optimizing this structure.

My idea is to hash the input (the combination of 5 inputs) by hashing each individual input and combining that 5 hashes into one single hash (int 32 or 64) and then look up that hash.

My question is there a known data structure that can handle this situation well? and Is it worth doing that optimization? I wanna optimize both memory and run-time.

I am using C++ and std::unordered_map the key is the combined string from the 5 inputs and the output is random. I didn't find any relation between the inputs and outputs (random or serial).

125 458 699 sadsadasd 5  => 56.
125 458 699 sadsadasd 3  => 57.
125 458 699 sadsadasd 4  => 58.
125 458 699 sadsadasd 5  => 25.
125 458 699 gsdfsds 3  => 89.

The domain of each of the inputs is small (the 4th input has 2K different values while the other inputs can have about only 20 different values).

해결책

You could use GNU perf to generate a perfect hash function for your keys.

다른 팁

It seems to me that there is no way to reduce the size of your keys that will result in reliable extraction. Hashing the 5 inputs into 1 integer is a one-way function that will prevent you from performing reliable lookups.

The way round this would be to keep a translation table, but that's actually more overhead because each distinct tuple of inputs would require the storage for 2 hashes and the tuple.

I think you're best off using a std::tuple<int, int, int, std::string, int> as the key type in a single map.

If you use a std::map<tuple<>, data_type> you won't need to provide a hashing function. If you stay with the unordered_map, you'll need to provide one since std::tuple has no default hash<> specialisation.

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow