Question

I was reading about skip lists and MemSQL and was wondering why skip lists are not more widely used in databases? Are there any major disadvatages to using skiplists?

Était-ce utile?

La solution

Databases are typically so huge that they have to be stored in external memory, such as a giant disk drive. As a result, the bottleneck in most database applications is the number of times that we have to do a memory transfer from the disk drive into main memory.

B-trees and their variants are specifically designed to minimize the number of block reads and writes necessary to perform each of their operations. Mathematically, the number of memory transfers required for each B-tree operation is O(log n / log B), where B is the block size. Compare this to a skiplist, which requires O(log n) memory transfers on expectation. Since B is usually measured in megabytes, log B can be in the neighborhood of 15 - 25, so the B-tree can be significantly faster. Even when the database is in main memory, the effect of the memory hierarchy (L1 and L2 caches, etc.) can be so pronounced that B-tree variants are still faster in practice than many other data structures. This Google blog post gives some background on this.

Although each operation on a B-tree typically requires more CPU work than corresponding operations in other data structures, the fact that they require so few memory transfers tends to make them significantly faster in practice than other data structures. Therefore, it would not be advisable to use a skip list in a database.

There's another reason B-trees are nice: they're worst-case efficient. Although deterministic skip lists do exist, most skiplist implementations are randomized and give expected guarantees on their behavior. In a database, this might be unacceptable because many use cases on databases require worst-case efficient behavior.

Hope this helps!

Autres conseils

Though its late in the game but I felt the urge to reply as its top rated answer and perhaps doesn't convey complete message.

Skip lists differ from balanced tree data-structure as it allows combining several lists efficiently. In data-base terms it allows indexes based on skip-lists to be combined efficiently. A good example is Lucene which powers search engines like Solr/ElasticSeach. https://issues.apache.org/jira/browse/LUCENE-866.

B-Tree has problems in combining multiple indexes without indexing the overall combination a-priori which is not efficient as it requires re-indexing of historical records.

Hence whenever data-store has to support arbitrary queries on data skip lists are an ideal choice.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top