How to enter records with keys to an initially empty B+ tree?

https://dba.stackexchange.com/questions/172669

btree

07-10-2020
|

Question

Show the result of entering the records with keys in the order (1, 2, 3, 4, 5) to an initially empty B+ –tree of order m = 3. In case of overflow, split the node and do not re-distribute keys to neighbors. Is it possible to enter the records with keys in a different order to have a tree of less height?

From Relational DBMS Internals, chapter 5: Dynamic Tree-Structure Organizations, p.50

I'm not good on this yet I tried to do ≤ on the left and > on the right :

Until insertion of 1,2 :

Then as far as we have to split the node and not to redistribute keys to neighbors (I understand it as son-nodes) I inserted only on the right of cell with 2 :

And I continued doing the same with when inserting 5 :

But this is quite odd, I never saw empty nodes like these ... And I don't know if it respects some very basic B-trees properties :

each node has at most (m-1) keys and at least (⌈(m/2)⌉-1) keys unless a key can be empty and I would understand the key as a "pointer".

First attempt : error on the order revealed an ambiguous tree

At the beginning I misunderstood what an "order" was (the max number of children per node). So I thought a node could have three spaces (and therefore 4 children. I was creating a tree of order 4 I think :

Until insertion of 1,2,3 :

Insertion of 4, as far as we have to split the node and not to resdistribute keys to neighbours (which seems contradictory) I would let 1,2,3 and 4,5 on the right leaf after 3 :

Solution

I think you have your page creation upside down. When a node splits it doesn't create more nodes down the hierarchy (son nodes in your nomenclature). Instead it creates more upwards, toward the root. As the book says

Note that the growth is at the top of the tree, and this is an intrinsic characteristic of a B–tree to ensure the important properties that it always have all the leaves at the same level, and each node different from the root is at least 50% full.

(My emphasis.)

From the linked ebook:

Definition 5.1 A B–tree of order m (m ≥ 3) ... each node contains at most m − 1 keys

The exercise is for m=3, so at most 2 keys per node.

The first two keys are easy -- they go into the first page:

A:[1,2]

I'm going to use ASCII art. I'll label each page in the sequence they're created and show the keys/ pointers within the page. So page P containing key values k1 and k2 will be P:[k1,k2].

Now key 3 comes along. According to Section 5.2.1 ... Insertion, the first task is to search. This determines key 3 should be on page A - the only page we have. Further "if [that node] is full, it will be split into two nodes." The page is full so it must split. We now have

A:[1,2]    B:[3, ]

But this is not a tree! As the book says:

the pointer to [the new node], .. is inserted into the father node .. of [the current node], repeating the insertion operation in this node [i.e. the father node]. This splitting and moving up process may continue if necessary up to the root, and if this must be split, a new root node will be created ..

(My emphasis to show processing continues up the tree toward the root, not down toward leaves.)

So we must put a pointer to the new page (B) into the father of the current page (A). There must be a new root node:

      C:[2,3]
      /     \
A:[1,2]    B:[3, ]

I have the pointers in non-leaf pages pointing to the highest value in a child (son) node. Your linked text may do this differently but the result will be equivalent.

Key value 4 arrives; following the algorithm we search to find which page it should be on. It should be page B. There is room for it so we update that page and the pointer on page C:

      C:[2,4]
      /     \
A:[1,2]    B:[3,4]

Next we insert key 5. It should go in page B but it's full. Therefore it splits

      C:[2,4]
      /     \
A:[1,2]    B:[3,4]   D:[5, ]

We must update the father node. It, too, is full so it splits:

      C:[2,4]    E:[5, ]
      /     \         \
A:[1,2]    B:[3,4]   D:[5, ]

The split propagates up and a new root node forms:

            F:[4,5]
            /     \
      C:[2,4]    E:[5, ]
      /     \         \
A:[1,2]    B:[3,4]   D:[5, ]

By growing upwards the tree maintains identical depth in every branch. This is important for predictable performance. (Some say the B in B-Tree stands for "balanced" for this very reason.)

As for the second part -- "Is it possible to enter the records with keys in a different order to have a tree of less height?" With 5 keys and two keys per node we need at least 3 leaf nodes to hold all values and a height of 3 to form the tree. So my arrangement is optimal for the given data, sequence and algorithm.

The book uses a very different pointer arrangement to what I use, and a different page split arrangement. This will be significant, leading to part-full pages. That there is a section on page 42 called "Data Loading" that shows how fuller pages can be achieved by loading out of key sequence supports my hunch. However, I hope I've given you sufficient pointers and you'll be able to use the book's pointer structure to work that out for yourself.

I've come across this interactive simulation of how a B-Tree grows.

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange