How to generate maximally unbalanced AVL trees

Question 1

Basic Solution

Fibonacci trees have several properties that can be used to form a compact Fibonacci tree:

Every node in a Fibonacci tree is itself a Fibonacci tree.
The number of nodes in a Fibonacci tree of height n is equal to F_n+2 - 1.
The number of nodes in between a node and its left child is equal to the number of nodes in the node's left child's right child.
The number of nodes in between a node and its right child is equal to the number of nodes in the node's right child's left child.

Without loss of generality, we will assume our Fibonacci tree has the following additional property:

If a node has height n, then the left child has height n-2, and the right child has height n-1.

Combining these properties, we find that the number of nodes in between a node of height n and its left and right children is equal to F_n-1 - 1, and we can use this fact to generate a compact Fibonacci tree:

static int fibs[] = { 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765, 10946, 17711, 28657, 46368, 75025, 121393, 196418, 317811, 514229, 832040, 1346269, 2178309, 3524578, 5702887, 9227465, 14930352, 24157817, 39088169, 63245986, 102334155, 165580141, 267914296, 433494437, 701408733, 1134903170};

void fibonacci_subtree(int root, int height, int *fib)
{
    if (height == 1) {
        insert_into_tree(root);
    } else if (height == 2) {
        insert_into_tree(root + *fib);
    } else if (height >= 3) {
        fibonacci_subtree(root - *fib, height - 2, fib - 2);
        fibonacci_subtree(root + *fib, height - 1, fib - 1);
    }
}

...

for (height = 1; height <= max_height; height++) {
    fibonacci_subtree(0, height, fibs + max_height - 1);
}

This algorithm generates the minimum number of nodes possible for a given height, and it also produces the minimum possible range. You can shift the range around by making the root node something other than zero.

Compact Fill Algorithm

The basic solution only produces Fibonacci trees, which always have F_n+2 - 1 nodes. What if you want to generate an unbalanced tree with a different number of nodes while still minimizing the range?

In that case, you need to generate the next larger Fibonacci tree with a few modifications:

Some number of elements at the end of the sequence will not be inserted.
Those elements will create gaps, and the location of those gaps need to be tracked.
The difference between nodes needs to be reduced appropriately.

Here's one approach that still takes advantage of the recursive nature of the solution:

void fibonacci_subtree(int root, int height, int *fib, int num_gaps, bool prune_gaps)
{
    if(height < 1)
        return;
    if(prune_gaps && height <= 2) {
        if(!num_gaps) {
            if(height == 1) {
                insert_into_tree(root);
            } else if(height == 2) {
                insert_into_tree(root + *fib);
            }
        }
        return;
    }
    if(height == 1) {
        insert_into_tree(root);
    } else {
        int max_rr_gaps = *(fib - 1);
        int rr_gaps = num_gaps > max_rr_gaps ? max_rr_gaps : num_gaps;
        num_gaps -= rr_gaps;

        int max_rl_gaps = *(fib - 2);
        int rl_gaps = num_gaps > max_rl_gaps ? max_rl_gaps : num_gaps;
        num_gaps -= rl_gaps;

        int lr_gaps = num_gaps > max_rl_gaps ? max_rl_gaps : num_gaps;
        num_gaps -= lr_gaps;

        int ll_gaps = num_gaps;
        fibonacci_subtree(root - *fib + lr_gaps, height - 2, fib - 2, lr_gaps + ll_gaps, prune_gaps);
        fibonacci_subtree(root + *fib - rl_gaps, height - 1, fib - 1, rr_gaps + rl_gaps, prune_gaps);
    }
}

The main loop is slightly more complicated to accommodate an arbitrary range of keys:

void compact_fill(int min_key, int max_key)
{
    int num_nodes = max_key - min_key + 1;
    int *fib = fibs;
    int max_height = 0;

    while(num_nodes > *(fib + 2) - 1) {
        max_height++;
        fib++;
    }

    int num_gaps = *(fib + 2) - 1 - num_nodes;

    int natural_max = *(fib + 1) - 1;
    int max_r_gaps = *(fib - 1);
    int r_gaps = num_gaps > max_r_gaps ? max_r_gaps : num_gaps;
    natural_max -= r_gaps;

    int root_offset = max_key - natural_max;

    for (int height = 1; height <= max_height; height++) {
        fibonacci_subtree(root_offset, height, fibs + max_height - 1, num_gaps, height == max_height);
    }
}

Closed Form Solution

If you look at the differences between every pair of words generated by the basic solution, you find that they alternate between two sequential elements of the Fibonacci sequence. This alternating pattern is defined by the Fibonacci word:

A Fibonacci word is a specific sequence of binary digits (or symbols from any two-letter alphabet). The Fibonacci word is formed by repeated concatenation in the same way that the Fibonacci numbers are formed by repeated addition.

It turns out there's a closed-form solution for the Fibonacci word:

static double phi = (1.0 + sqrt(5.0)) / 2.0;

bool fibWord(int n)
{
    return 2 + floor(n * phi) - floor((n + 1) * phi);
}

You can use this closed-form solution to solve the problem using two nested loops:

// Used by the outer loop to calculate the first key of the inner loop
int outerNodeKey = 0;
int *outerFib = fibs + max_height - 1;

for(int height = 1; height <= max_height; height++) {

    int innerNodeKey = outerNodeKey;
    int *smallFib = fibs + max_height - height + 3; // Hat tip: @WalterTross

    for(int n = fibs[height] - 1; n >= 0; n--) {
        insert_into_tree(innerNodeKey);

        // Use closed-form expression to pick between two elements of the Fibonacci sequence
        bool smallSkip = 2 + floor(n * phi) - floor((n + 1) * phi);
        innerNodeKey += smallSkip ? *smallFib : *(smallFib + 1);
    }

    if(height & 0x1) {
        // When height is odd, add *outerFib.
        outerNodeKey += *outerFib;
    } else {
        // Otherwise, backtrack and reduce the gap for next time.
        outerNodeKey -= (*outerFib) << 1;
        outerFib -= 2;
    }
}

Question 2

I have found this answer to my question, but I still hope that a simpler and, especially, more time-efficient and not less space-efficient algorithm can be found, hopefully with much better key range properties too.

The idea is to generate the Fibonacci trees up to a given height (which must be known beforehand), completely avoiding all tree rotations. Intermediate trees are kept AVL-balanced by the choice of the insertion order. Since they have the height of the lower of the two Fibonacci trees they link, they are all maximally unbalanced.

Insertions are done by virtually inserting all nodes in the sequence of Fibonacci trees, but, for each virtual tree, effectively inserting only the nodes which are subtrees of height 1. These are the "incremental" nodes between two consecutive Fibonacci trees.

Here is how it works in the case max_height = 5:

insert 0
=> Fibonacci tree of height 1 (1 node):
                0
insert 8
=> Fibonacci tree of height 2 (2 nodes):
                0
                        8
insert -8
insert 12
=> Fibonacci tree of height 3 (4 nodes):
                0
       -8               8
                           12
insert -4
insert 4
insert 14
=> Fibonacci tree of height 4 (7 nodes):
                0
       -8               8
           -4       4      12
                             14
insert -12
insert -2
insert 6
insert 10
insert 15
=> Fibonacci tree of height 5 (12 nodes):
                0
       -8               8
  -12      -4       4      12
             -2       6  10  14
                              15

And here is the code (simplified):

void fibonacci_subtree(int root, int height, int child_delta)
{
   if (height == 1) {
      insert_into_tree(root);
   } else if (height == 2) {
      insert_into_tree(root + child_delta);
   } else if (height >= 3) {
      fibonacci_subtree(root - child_delta, height - 2, child_delta >> 1);
      fibonacci_subtree(root + child_delta, height - 1, child_delta >> 1);
   }
}

...
   for (height = 1; height <= max_height; height++) {
      fibonacci_subtree(0, height, 1 << (max_height - 2));
   }

UPDATE

The solution by godel9 solves the problem of the spread of the keys of this solution. Here is the output of godel9's code:

insert 0
=> Fibonacci tree of height 1 (1 node):
                0
insert 3
=> Fibonacci tree of height 2 (2 nodes):
                0
                        3
insert -3
insert 5
=> Fibonacci tree of height 3 (4 nodes):
                0
       -3               3
                            5
insert -2
insert 1
insert 6
=> Fibonacci tree of height 4 (7 nodes):
                0
       -3               3
           -2       1       5
                              6
insert -4
insert -1
insert 2
insert 4
insert 7
=> Fibonacci tree of height 5 (12 nodes):
                0
       -3               3
   -4      -2       1       5
             -1       2   4   6
                               7

And here is the code in the version closest to mine (here with a static fibs array):

static int fibs[] = { 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765, 10946, 17711, 28657, 46368, 75025, 121393, 196418, 317811, 514229, 832040, 1346269, 2178309, 3524578, 5702887, 9227465, 14930352, 24157817, 39088169, 63245986, 102334155, 165580141, 267914296, 433494437, 701408733, 1134903170 };

void fibonacci_subtree(int root, int height, int *fib)
{
   if (height == 1) {
      insert_into_tree(root);
   } else if (height == 2) {
      insert_into_tree(root + *fib);
   } else if (height >= 3) {
      fibonacci_subtree(root - *fib, height - 2, fib - 2);
      fibonacci_subtree(root + *fib, height - 1, fib - 1);
   }
}

...
   for (height = 1; height <= max_height; height++) {
      fibonacci_subtree(0, height, fibs + max_height - 1);
   }

The final Fibonacci tree of height H has F_H+2 - 1 nodes with no "holes" between the key values, and has k_max - k_root = F_H+1 - 1. The key range can be positioned conveniently, if necessary, by offsetting the key value of the root, and optionally exchanging left and right in the algorithm.

What remains unsolved is the compact filling of any given key range with integer keys (while it is trivial for exactly balanced trees). With this algorithm, if you want to make a maximally unbalanced tree with n nodes (with integer keys), where n is not a Fibonacci number - 1, and you want the smalles possible range of keys, you can find the first height that can accomodate n nodes, and then run the algorithm for this height, but stopping when n nodes have been inserted. A number of "holes" will remain (in the worst case ca. n/φ ≅ n/1.618).

Contrary to what my intuitive understanding was when I wrote the introduction to this solution, the time-efficieny of this algorithm, with or without godel9's important improvement, is already optimal, since it is O(n) (so that when the insertions are included, it becomes O(n log n)). This is because the number of operations is proportional to the sum of the nodes of all Fibonacci trees from T_F₁ = T₁ to T_{F_H} = T_n, i.e., N = Σ_h=1...H(F_h+2 - 1) = F_H+4 - H - 1. But two consecutive Fibonacci numbers have the asymptotic ratio φ ≅ 1.618, the golden ratio, so that N/n ≅ φ² ≅ 2.618. You can compare this with completely balanced binary trees, where very similar formulas apply, only with a "logarithm" of 2 instead of φ.

Although I doubt that it would be worthwile to get rid of the φ² factor, considering the simplicity of the current code, it's still interesting to note the following: when you add the "incremental" nodes of any intermediate Fibonacci tree of height h, the difference between two consecutive keys of this "Fibonacci frontier" (my term) is either F_H-h+3 or F_H-h+4, in a peculiar alternating pattern. If we knew a generating rule for these differences, we could possibly fill the tree simply with two nested loops.

Question 3

Interesting question. It looks like you already have a good solution, but I would find a more combinatorial approach easier.

Assumptions:

Let U(n) represent the number of nodes in a maximally unbalanced AVL tree of height n.
U(0) = 0
U(1) = 1
U(n) = U(n-1) + U(n-2) + 1 for n>=2 (i.e., a root node plus two maximally unbalanced subtrees)
For convenience, let us assume that U(n-1) is always the left subtree, and U(n-2) is always the right.
Let each node be represented by a unique string representing the path from root to node. The root node is the emptry string, the level 1 nodes are "L" and "R", the level 2 nodes are "LL", "LR", "RL", and "RR", etc.

Conclusions:

A valid string for a node at level A in U(n) is A characters long and satisfies the inequality: n - count("L") - 2 * count("R") >= 1
count("L") + count("R") = A or count("L") = A - count("R")
Thus count("R") <= n - A - 1

We can use the following functions to generate all valid paths at a given level and to determine the key value at each node.

void GeneratePaths(int height, int level)
{
  int rLimit = height - level - 1;
  GeneratePaths(height, rLimit, level, string.Empty, 0);
}

void GeneratePaths(int height, int rLimit, int level, string prefix, int prefixlen)
{
  if (prefixlen + 1 < level)
  {
    GeneratePaths(height, rLimit, level, prefix + "L", prefixlen + 1);
    if (rLimit > 0)
        GeneratePaths(height, rLimit - 1, level, prefix + "R", prefixlen + 1);
  }
  else if (prefixlen + 1 == level)
  {
    InsertNode(prefix + "L", height)
    if (rLimit > 0)
        InsertNode(prefix + "R", height);
  }
}

void InsertNode(string path, int height)
{
  int key = fibonacci(height);
  int index = height - 2;

  for (int i=0; i < path.length(), i++)
  {
    int difference = fibonacci(index);
    char c = path.charAt(i);
    if (c == 'L')
    {
      key -= difference;
      index -= 1;
    }
    else if (c == 'R')
    {
      key += difference;
      index -= 2;
    }
  }

  InsertKey(key);
}

If you use these functions to generate a U(5) tree, you would get this result. (Note that the keys on the left edge of the tree follow the fibonacci sequence from 1 to 5,)

            5
      3           7
   2     4      6   8
  1 3   4      6
 1