Need associative array that is fast when inserting, finding nearest key, and iterating in key order

https://stackoverflow.com/questions/14280671

15-01-2022
|

Question

I am performing something similar to an N-dimensional convolution, but will be combining values that are close to one another as I proceed, to save memory and time.

I look for a key in the array.
If I find the key, I add to the value stored at that key.
If I do not find the key, I find the next highest and next lowest key.
If the closer of the two neighbors is close enough, then I accumulate with that key-value pair.
Otherwise I add a new key-value pair.

The key is a double. It is always positive and never infinite. (I handle zeroes specially.) I expect the values to range from pennies to as high as 100 billion. The rounding coarseness will change as the algorithm proceeds to maintain a maximum array size between 10,000 and 1,000,000. (Only testing will reveal the sweet spot in the trade-off between speed, memory and accuracy.) Because of the range of values versus array size, direct addressing is not practical; I need sparse storage.

The naive approach is to use a List and perform a BinarySearch to find the key or insertion point, then proceed from there. This is fast for finding the nearest key, can be iterated in key order, but inserts are horrible. (I do not need to perform deletes! Each iteration in the outer loop creates a new list from scratch.)

What data structure is recommended? Wikipedia mentions a few, like Trie, Judy array, etc.

(I implemented something Trie-like with similar characteristics years ago, but that was in java, took me a week to implement, and was tricky. I am crunched for time.)

UPDATE:

The suggestion of SortedSet causes me to modify my requirements. While finding the next lowest and next highest key was my way of accomplishing my task, SortedSet.GetViewBetween goes about things differently. Since I just want to see if there is a value close enough to be aggregated with, and I have a certain rounding granularity G, I can just ask for all elements of interest using

var possibilities = mySet.GetViewBetween(x - G, x + G)

If that set is empty, I need to add. If it is not, it is probably a small set and I iterate through it.

I need to perform performance testing to see if it is fast enough. But even if it does not, another collection that has the same contract is an acceptable alternative to FindNextHighestKey and FindNextLowestKey.

UPDATE 2:

I have decided to use plain Dictionary, and force the keys into buckets using a custom rounding function. Iterating the items in sorted order is not crucial, and by using this rounding function, I can find "close enough" values with which to aggregate. I will not change the granularity during an iteration; I will adjust it every time I finish convolving with a new dimension. Each iteration I create a new array to hold the results of that pass.

Solution

If your key is unique you may look at Dictionary<TKey,TValue> or SortedDictionary<TKey,TValue>

OTHER TIPS

I found this question, which let me to SortedSet<T>.

If you can handle O(log(n)) for insert, delete, and lookup, this might be where you should keep your keys.

Based on your new requirement... Why not just map the doubles by the granularity to sparse keys before use and go with a Dictionary<double, T> ? This won't work if you want the granularity to change during runtime, but neither would the other approach really.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow