When converting to a red-black tree, is there any reason to choose one form over another?

https://stackoverflow.com/questions/1677011

16-09-2019
|

Question

I have a library of linked list/binary tree methods for use when standard containers don't fit - e.g. when there are different types of nodes, or when I need to convert from binary tree to list and back again. It includes red-black tree handling.

One of the methods converts from a double-linked list to a perfectly balanced simple binary tree in O(n) time (given that the number of items is known in advance). The algorithm is known as "folding" - it's the second half of a binary tree rebalancing algorithm that was once published in Dr. Dobbs'. The steps are basically...

Given the size of the tree, decide on the sizes of the left and right subtrees
Recurse for the left subtree
Pop a node from the list to use as the root
Recurse for the right subtree
Link the subtrees to the root

I also have a similar method that creates a red-black tree. The principle is the same, but the recursion keeps track of node height - height zero nodes are created red, all others are black. The starting height calculation is based on the highest set bit in the tree size, and is fiddled so that a perfectly balanced (2^n)-1 sized tree has only black nodes (the recursion only goes down to height one).

The point here is that I only have red nodes at the leaf level, and a maximum of precisely half the nodes are red.

The thing is, while this is a simple way to generate a valid red-black tree, it isn't the only option. Avoiding having all leafs red in a perfectly balanced tree was an arbitrary choice. I could have alternating layers of red and black nodes. Or I could reduce the number of red nodes dramatically in some cases by spotting subtrees that are perfectly balanced and (if it needs red nodes) making the subtree root red instead of all its leaves.

The question is - is there any practical reason to choose one valid red-black tree form over another?

This is pure curiosity - I know I don't have any practical reason - but does anyone know of a specialist application where this choice is significant?

Solution

In the standard analysis of the amortized cost of modifying red-black trees using the pysicist's method, black nodes with either zero or two red children are assigned a positive potential of one, meaning that they represent problematic places in the tree where extra work may need to be done. Red nodes and black nodes with exactly one red child are assigned a potential of zero.

So, to reduce the cost of modifications, give every black node one red child.

The reason why black nodes with one red child are blessed is explained best by analogy to redundant binary numbers. I will first explain how to relate red-black trees to binary numbers and then I will explain why one-red-child nodes are useful.

As you may know, red-black trees are a way of representing 2-4 trees, in which every simple path from the root to a leaf has the same length but nodes have 2, 3, or 4 children. The simplest algorithm for adding or removing a node in a 2-4 tree is the same algorithm as adding or subtracting one from a redundant binary number.

A redundant binary number is a number in which the ith digit represents 2ⁱ, just as in a standard binary number, but the ith digit can be 0, 1, or 2. They are called redundant because there are multiple ways to write a given number. 4_dec can be written 100 or 20 or 12.

To add one to a redundant binary number, you increment the least significant digit; if it is 3, set it to 1 and increment the next least significant digit, and so on. The algorithm halts when it encounters a 0 or 1.

To add a leaf to a 2-4 tree, add a child to its intended parent. If the parent how has five children, split it into two nodes and make them children of its parent. Continue until you reach a node that doesn't need splitting. So, the path towards the root halts when it encounters a node with two or three children.

To bound the amortized cost of incrementing a redundant binary number, use the physicists method and assign a potential of 1 to each 2 digit. An xall to increment that touches k digits releases k-1 potential, giving it an amortized cost of O(1).

That analysis is similar to the amortized cost of incrementing a standard binary number, but a standard binary number cannot support both increment and decrement in O(1) amortized time: consider 2^k - 1. It is k 1 digits. Increment costs Θ(k). If that is followed by a decrement, the pair costs Θ(k) and brings the number back to its old state.

Redundant binary is special in that 1 digits halt the cascading operations of both increment and decrement. 2-4 trees are special in that 3-nodes halt the cascading operations of both insert and delete.

In red-black trees, a node with one red child is just a representation of a 3-node in a 2-4 tree. These nodes are special and robust against inserts or deletes in their subtrees, so you should favor them when building red-black trees that will see a lot of updates.

If you know you will see only inserts, favor nodes with two black children. If you know you will see only deletes, favor nodes with two red children.

OTHER TIPS

The short answer is: it depends.

Basically, any valid tree will suffice. However, in terms of amortized analysis - it might very possibly be that you will want to choose the most correct tree that in the long run will give you the most optimized behavior.

e.g. if you always choose a valid tree, but one that is prone to lots of balancing operations, you will get bad amortized performance. An obvious example is a fully-black tree, which is perfectly valid, yet performs bad when modified.

It depends, because this usually will be application-specific.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow