skip list for small data sets

https://stackoverflow.com/questions/13041897

14-07-2021
|

Question

I am rather interested in using a skip list for a Open list for A*. However what bugs me is the probabilistic nature of it. Open lists can vary from very small sets on up to hugh numbers of nodes and needs to maintain performance for each.

As I understand Skip lists have a higher chance of Randomly giving bad results for small data sets. I think this might become problematic with large numbers of short Paths being generated.

I was thinking to fix this why not watchdog the random numbers to a degree. Keep a running total of the number of nodes of each level and in order to maintain the ideal distribution of nodes between each level sometimes intervene and force a node to be a specific level.

I am unsure just how well this would work in my given application and if maybe i should focus on another data structure for my open lists instead.

All the articles i have read on skip lists have made no mention of such an optimization. Since I am fairly new to the whole performance profiling game, I am hesitant to try and improve on a documented data structure.

Solution

Yes, you are correct - the skip lists has higher chance to decay when talking about small skip lists.
In general, according to this paper, there is a constant alpha such that the probability of a skip list to exceed alpha * n space is less then 1/2^Ω(sqrt(n)) - so as the skip lists get bigger, the probability of this (exceeding this limit) gets smaller and smaller.
To avoid the skip list worst cases, one can use a variation of the original skip list, a deterministic skip list. This subject is discussed in this thesis work
Other alternatives are balanced BSTs, such as AVL and red-black-tree, or even B+ trees (which are usually prefered for file systems).
If your "open set" is indeed so small - odds are your branch factor is very small as well (close to 1), or to be exact the B^d (d=depth of solution; B=branch factor) will be small as well. This will result in a fast look up, regardless of the skip list implementation, because few nodes are expected to be needed.

OTHER TIPS

I would suggest that you create a program that randomly generates skip lists of the lengths that you expect your A* algorithm to create. Inspect those lists to see how many of them are less than optimal.

I would not recommend trying to create a production skip list data structure that has the monitoring that you proposed. You'll likely find that the monitoring and intervention code will cause the skip list to perform poorly in the general case.

When you say "Skip lists have a higher chance of Randomly giving bad results for small data sets", exactly what are you afraid of?

What you shouldn't be afraid of is that, for a list of say 10 elements, there aren't enough level 2 or 3 nodes to speed up the traversal. The difference between iterating a linked list (which is what a skiplist with no level 2+ nodes boils down to) of 2 elements or 10 elements is basically non-existent, even in tight loops (the type of node reference administration required by your data structure will probably have more of an impact).

Obviously, once you get to more elements, the impact of not having enough higher level nodes increases. But luckily, the chance of obtaining a higher-level node also increases. For example, if you add 8 elements, the chance of them all being a level 1 node is 0.5^8 = 0.39%, i.e. extremely unlikely.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow