Contiguous associative container : what would be the fastest implementation?

https://stackoverflow.com/questions/23190247

06-07-2023
|

Question

After 2 months of applied maths research, I have found a way to have substantial gains to manage data in my application domain (to be published...). In short, my application domain requires an associative container, contiguous in memory to avoid cache misses in large scale supercomputing applications. Fast insertions and deletions are not the priority : the priority is to have the fastest possible binary search on keys via std::upper_bound. I have an early working implementation, but now I am thinking about implementing an STL-like container with a good design.

So, my associative container has the following properties:

it usually contains between 10 and 100 millions of keys and values
keys are unsigned integers of 8, 16, 32, 64 or 128 bits
values are usually tuples of ~10 different types
each key is associated to a single value
the most critical operation is binary search on keys (probably via a call to std::upper_bound)
insertions/deletions/sorting are not critical

So my question is: what would be the best internal implementation (of course I will eventually run some benchmarks, but discarding some possibilities would be great):

std::vector<std::pair<Key, Type>> with iterators to std::pair<Key, Type>
std::pair<std::vector<Key>, std::vector<Type>> with iterators to std::pair<Key&, Type&>
Other solutions ?

What would be the best solution ? Any remarks or ideas will be greatly appreciated...

Solution

For a similar application as you, I've had success with an open hash scheme.

A "closed hash" maintains a list of objects that map to each hash value. Collisions result in list growth, but the lists are distinct heap objects with poor cache locality.

An open hash goes all into the same array. Good for the CPU caches.

For extra performance, use a "perfect hash"-like function which avoids scrambling data and minimizes randomness. Instead try to find and preserve temporal locality of items that are accessed close together, mapping it to spatial locality.

Such an optimized hash does still need to be uniform over its range, though. I used a pre-analysis step, randomly sampling the domain to compute the hash function. Addition of such complexity needs to be driven by knowing exactly how much time is being spent on those cache misses in the first place.

OTHER TIPS

See Boost.Container's boost::container::flat_map<> for a well-tested, well-documented implementation of the very concept you're asking about.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow