What linked list data structure adjustments would give me fast random lookup?

https://cs.stackexchange.com/questions/13620

16-10-2019
|

문제

I am presently using an doubly linked list (C++ std::list) to hold a bunch of records that each have a unique integer identifier. The linked list is created in sorted order such that in the list, the next item always has a larger unique identifier than its predecessor.

The issue I'm facing is that occasionally I need to be able to insert an item quickly into its relative sorted position and using a plain linked list means this operation is $O(n)$ which is causing performance issues for me. Generally, this would mean I want to use something like a binary tree (C++ std::map), however, I am also depending upon the following feature of a doubly linked list for good performance:

Ability to splice a contiguous section out of one linked list into another in $O(1)$ time. (Amortized $O(1)$ or $O(\log \log n)$ would be good enough.)

One feature of my data that I would like to leverage is that I often have long ranges of contiguous records where each one's unique integer is exactly one more than its predecessor. When searching for an item's relative sorted position, it would always be outside such contiguous records since there are no duplicate identifiers.

I'd like to find a replacement data structure or augmentation to a doubly linked list that will allow me to continue to splice whole sections from one list to another in constant time but allow me to locate the sorted position in which to insert a new record in better than $O(n)$ time.

Other operations include forward and backward iteration across the items. The record indexes begin at zero and grow upwards towards 64 bits, generally sequentially, and the code works well in such cases. Occasionally some records are not available before subsequent ones, it is the insertion of these missing records that causes the performance issues now.

One possible approach that occurs to me is to cache the location of several indexes. The cache would get invalidated whenever a splice removes items that might overlap the cached entries. With this cache, instead of doing a linear search, the search could instead begin from the cache point iterator whose unique index is closest to the one whose position is being searched for. However, I'd like to more fully utilize the feature of the contiguous records. I also thought about a hierarchical linked list where I have a top level linked list of contiguous regions, where each region is a linked list of records that are consecutive, but I didn't see a clean way to adapt a linked list to provide this functionality. Perhaps something like this has been done before? I find skip lists to be close, but do not see the splice() functionality, plus a generic skip list would not leverage the fact that insertion never occurs within contiguous records.

해결책

One simple approach might be to use a doubly-linked list of extents, where each extent represents a sequence of contiguous records. The records within each extent could in turn be represented with a doubly linked list.

This preserves your ability to do $O(1)$ time splicing, and now the insert operation takes $O(k)$ time, where $k$ is the number of extents (rather than $O(n)$, where $n$ is the number of records). If you have many fewer extents than records, this might be a partial improvement.

I don't know whether this will be better than a simple skip list or binary tree.

Note that if you use a binary tree, you can still do efficient splicing. The splicing operation is no longer $O(1)$ time, but it can be done in $O(\log \ell)$ time, where $\ell$ is the number of records in the spliced segment. This is not as fast as $O(1)$ time splicing, but depending upon the relative frequency of your different operations, it is another data structure you could consider (e.g., benchmark on realistic data sets).

And, of course, you could combine these ideas, e.g., with a binary tree of extents, where each extent is in turn a doubly-linked list of contiguous records. Inserts will take $O(\lg k)$ time, and splicing can be done in $O(\lg \ell)$ time (both of which might potentially be better than the $O(\lg n)$ achieved through a simple binary tree).

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 cs.stackexchange