Question

I know there is bulk loading in b+tree. I just wanted to know is there any algorithm for bulk loading in B-Tree. For example given an array of data what is the best way to create a B-Tree?

Était-ce utile?

La solution

Actually The answer is yes.

The main difference between B+-trees and plain B-trees is that the values are actually stored in the leaves for the former, while in the later you will find values in every nodes. Hence B+-trees let you store data in an almost continuous manner, each leaf containing a contiguous slice of the whole sorted data. This cannot be true for B-trees: an inner node will contain several elements, but they won't be contiguous wrt. the whole sorted dataset.

This property is essential for bulk loading: that process works on an already sorted dataset by cutting it into the arrays which will form the leaves of the B+-tree. Thus for a B-tree it looks like it cannot work.

If we can sort the data in a way which group together inner nodes elements, then the problem is solved. In order to do that, one must know in advance how the elements will be grouped. This turns out to be possible.

Let's call o (order) the minimal number of children in a node (that's consistent with the original definition of a B-tree order). We consider the root node to be at the highest stage of the tree, and the leaves to be at the lowest (stage 0). For a well balanced tree, all the leaves will indeed be at the same stage.

The stage k of the tree groups elements which are spaced by at least o elements in the stage k-1. After an initial sorting, we must extract elements from the sorted array, which constitutes stage 0, and group them in a different array to build stage 1, then do that again with that array into a new array for stage 2, and repeat the process until there's less than o elements in the newest array, which will be the root stage. From then on, it is possible to build the tree directly from the stage set:

  • split each stage in arrays of o elements,
  • build index arrays to link nodes to subnodes
  • build each node as the pair of corresponding index array * value array

The resulting tree will not necessarily be well balanced. It depends on the number of entries in the dataset, and o. It should be possible to tune the interval used in building the stages to have a better distributed tree though.

All in all the work needed to bulk load a B-tree is more tedious than for B+-tree, but it is possible.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top