Is the key count per node in a btree the same on average for large and small btrees?

https://stackoverflow.com/questions/19148890

30-06-2022
|

Question

I need a bit of help with my math, which hurts my brain these days.

I use a pool each for nodes of different sizes for many btrees. That works out great if btrees tend to have the same average number of keys per node for large trees as for small trees. However, if the distribution is different I could get into a situation where I have too many of one size of node sitting free in the pool and a shortage of others.

Instead of splitting nodes, all changes create a new node of the new number of keys and overwrite the old node in the tree with that. When it passes the max number of keys per node it will split the node evenly.

I intuitively think the distribution of node sizes will be the same for large and smaller trees (very small trees aside.) But I know better than to trust my intuition. Is it a reasonable assumption, or does the percentage of nodes of a given key count in a btree change with the size of the tree?

Solution

The classic implementation of a btree is to use a single radix (max number of keys per node) across the tree, and to split when the node fills, and join when the node fill decreases below a chosen threshold. One benefit btrees have over binary (AVL) trees is that balancing trees is minimized. The choice of radix, the mix of key additions and deletions, and the distribution of keys added to the btree all affect the fill ratio. Using nodes with different sizes would reduce the number of splits on a node, especially when adding ordered data to the tree. But rebalancing the tree fixes the problem of ordered data.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow