Question

guys. I think I've created an AVL tree implementation, but as AVL Tree is quite a complex structure, I need to test it. So the question is - how can I test it? Have you got any ideas? Up to this moment I have the following tests:

  1. basic sanity check - checks that for every node height equals max. height of child nodes + 1, balance is in [-1, 1], left child's key < this node's key < right child's key, and there are no circular references (like left child of a node is a node himself);

  2. check that inorder traversal on an AVL tree (and on a binary search tree in the whole) will return values from the underlying set in order;

  3. check that an AVL tree's height is strictly less than 1.44*log2(N+2)-1 (there N is number of elements) - proved by AVL tree creators;

  4. visual check - doesn't work that well, I try to draw a tree (rootnode in the first line, his direct children on the next line, childen of rootnode's direct childen on the third line and so on), but that works only on small trees, for big trees it becomes a complete mess;

  5. (?????) Russian wikipedia says that it is proven experimentally, that for two insertions one rebalancing needed and for five removals also one rebalancing needed, but is it really so? English wikipedia says nothing about it, and for my AVL one rebalancing needed for two insertions or for four removals, which is not quite the same.

Maybe these tests are enough, but if there are any more tests, not difficult to implement, why not to do it?

Was it helpful?

Solution

A key property of an AVL tree is that each of its sub-trees is also an AVL tree. That means that covering the basic scenarios should give you a broad coverage of the AVL tree functionality.

In other words, these tests done on the smallest tree structure that allows them are the most important ones:

  • Creating a new tree.
  • Inserting the first value.
  • Inserting a bigger value.
  • Inserting a smaller value.
  • Inserting a value that causes LL Rotation.
  • Same for the other rotations.
  • Same for Removing.
  • All the variants of Finding values.

If your implementation passes these tests, it would probably pass them on larger trees. Note that performance and memory usage is not tested here.

OTHER TIPS

In the spirit of all these answers, I thought I'd provide a couple concrete examples to demonstrate that the basic case is not enough.

Insert - Left/Right Rebalance

Consider the following AVL balanced binary trees for an insert operation:

  20+       20+           __20+__
 /         /  \          /       \
4         4    26       4         26
         / \           / \       /  \
        3   9         3+  9    21    30
                     /   / \
                    2   7   11

Inserting either an 8 or a 15 (for example) into any of these trees will trigger essentially the same Left/Right re-balancing, but the end results are significantly different for each tree and insert value. To wit, the final landing place of the inserted value and the final balance factors of node(4) and node(20) are entirely dependent on the relative value of the right child under node(4) - if any. A test solely off any one of these cases does not necessarily prove the correctness of any others. Note: node(4) must initially be balanced for these cases; an initial imbalance in node(4) ultimately has no effect on node(20).

Case 1a: Insert 15

  20+      20++         20++      15
 /        /            /         /  \
4     => 4-     =>   15+     => 4    20
          \         /
           15      4

Case 2a: Insert 15

    20+          20++           20++         9
   /  \         /  \           /  \         / \
  4    26 =>   4-   26 =>     9+   26 =>   4+  20
 / \          / \            / \          /   /  \
3   9        3   9-         4+  15       3  15    26
                   \       /
                    15    3

Case 3a: Insert 15

      __20+__                _20++_                  __20++_                ___9___
     /       \              /      \                /       \              /       \
    4         26    =>     4-       26    =>       9+        26    =>     4+      __20__
   / \       /  \         / \      /  \           / \       /  \         / \     /      \
  3+  9    21    30      3+  9-  21    30        4+  11-  21    30      3+  7  11-       26
 /   / \                /   / \                 / \   \                /         \      /  \
2   7   11             2   7   11-             3+  7   15             2           15  21    30
                                 \            /
                                  15         2

Case 1b: Insert 8

  20+      20++        20++      8
 /        /           /         / \
4     => 4-     =>   8+     => 4   20
          \         /
           8       4

Case 2b: Insert 8

    20+          20++           20++         9
   /  \         /  \           /  \         / \
  4    26 =>   4-   26 =>     9++  26 =>   4   20-
 / \          / \            /            / \    \
3   9        3   9+         4            3   8    26
                /          / \
               8          3   8

Case 3b: Insert 8

      __20+__                _20++_                  __20++_                ___9___
     /       \              /      \                /       \              /       \
    4         26           4-       26             9+        26           4        _20-
   / \       /  \         / \      /  \           / \       /  \         / \      /    \
  3+  9    21    30 =>   3+  9+  21    30 =>     4   11   21    30 =>   3+  7-  11      26
 /   / \                /   / \                 / \                    /     \         /  \
2   7   11             2   7-  11              3+  7-                 2       8      21    30
                            \                 /     \
                             8               2       8

The more complex cases were a problem for me when I was working on optimizing the calculation of balance factors (that is, adjusting balance factors only for affected nodes rather than recalculating the entire tree).

Delete - Double Rebalancing

Now consider these trees for a delete operation:

  2            ___6___               ___5___
 / \          /       \             /       \
1   4        2         9           2         8
   / \      / \       / \         / \       / \
  3   5    1   4     8   B       1   3     7   A
              / \   /   / \           \   /   / \
             3   5 7   A   C           4 6   9   B
                            \                     \
                             D                     C

Delete node(1) from each of these trees. Note that Case 1 effectively proves Case 2, but not at all Case 3.

Case 1

  2          2            4
 / \          \          / \
1   4    =>    4    =>  2   5
   / \        / \        \
  3   5      3   5        3

Case 2

    ___6___                ___6___                 ___6___
   /       \              /       \               /       \
  2         9            2         9             4         9
 / \       / \            \       / \           / \       / \
1   4     8   B     =>     4     8   B      => 2   5     8   B
   / \   /   / \          / \   /   / \         \       /   / \
  3   5 7   A   C        3   5 7   A   C         3     7   A   C
                 \                      \                       \
                  D                      D                       D

Case 3

    ___5___              ___5___                 ___5___                   ____8____
   /       \            /       \               /       \                 /         \
  2         8          2         8             3         8              _5_          A
 / \       / \          \       / \           / \       / \            /   \        / \
1   3     7   A     =>   3     7   A      => 2   4     7   A     =>   3     7      9   B
     \   /   / \          \   /   / \                 /   / \        / \   /            \
      4 6   9   B          4 6   9   B               6   9   B      2   4 6              C
                 \                    \                       \
                  C                    C                       C

The are plenty of examples of AVL rotations in books and on the internet, but what I found seemed arbitrary and no one place seemed to include simple examples for all 4 cases for insert and delete.

These are the simplest test case I could come up with for the 4 kinds of rotations. To make it easy to describe, I’ve used ascii characters as the key so a test case can be expressed as a string. For example, the string "abc" would be insert "a", insert "b" and then insert "c".

The full test cases create some pretty complicated trees so I’ve created two test suites. The first causes the rotations, but has empty sub-trees for the nodes being rotated making it easy to see what actually happened. The second suite has non-empty sub-trees to fully test the rotation code.

There seem to be two different nomanclature for rotates - what I learned as a 2L rotation, some books call an rl rotation and the 2R rotation is call an lr rotation. The text below uses 2R/2L.

These are the simple test cases for insert

"abc", on the insert of "c" will require a 1L rotation

a                   b
 \                 / \
  b   == 1L ==>   a   c
   \
    c

“cba”, on the insert of “a” will require a 1R rotation

    c               b
   /               / \
  b   == 1R ==>   a   c
 /
a 

"acb" on the insert of "b" will require a 2L rotation

a                  b
 \                / \
  c   == 2L ==>  a   c
 /
b

“cab” on the insert of “b” will require a 2R rotation

  c                b
 /                / \
a     == 2R ==>  a   c
 \
  b

For delete

“bcad”, on the deletion of “a” will require a 1L rotation

  b                   c
 x \                 / \
a   c   == 1L ==>   b   d
     \
      d

“cbda”, on the deletion of “d” will require a 1R rotation

    c                  b
   / x                / \
  b   d  == 1R ==>   a   c
 /
a 

“bdac” on the deletion of “a” will require a 2L rotation

  b                  c
 x \                / \
a   d   == 2L ==>  b   d
   /
  c

“cadb” on the deletion of “d” will require a 2R rotation

  c                  b
 / x                / \
a   d   == 2R ==>  a   c
 \
  b

The more complex test cases have sub-trees, most just being a single node. To make this post shorter, the insert and delete test cases are combined. The delete example becomes the insert example by skipping the insertion of the delete character. For example, using the 2R simple delete case above “cadb” becomes the insert case “cab” by skipping the insert of “d”. One consequence of this is the double rotate cases below require inserting an extra node to keep the tree balanced after inserting the node to be deleted. This results in the insert case not being minimal.

“cbedfag” on the delete of “a” or skipping “a” and the insert of “g” will require a 1L rotation at c

      c                 e
     / \               / \
    b   e  == 1R ==>  c   f
   x   / \           / \   \
  a   d   f         b   d   g
           \
            g

“ecfbdga” on the delete of “g” or skipping “g” and the insert of “a” will require a 1R rotation at e

      - e -                 c
     /     \               / \
    c       f  == 1R ==>  b   e
   / \     x             /   / \
  b   d   g             a   d   f
 /
a

“ecjadhkgilbf” on the delete of “b” or skipping “b” and the insert of “f” will require a 2L rotation at j then e. The insert case can optionally skip inserting “d”.

    - e -                       —- h —-
   /     \                     /       \
  c       j                   - e-      j
 / \     / \   == 2L ==>     /    \    / \
a   d   h   k               c      g  i   k
 x     / \   \             / \    /        \
  b   g   i   l           a   d  f          l
     /
    f

“hckbeiladfjg” on the delete of “j” or skipping “j” and the insert of “g” will require a 2R rotation at c then b. The insert case can optionally skip inserting “l”

      - h -                    - e -
     /     \                  /     \
    c       k                c       - h -
   / \     / \  == 2R ==>   / \     /     \
  b   e   i   l            b   d   f       k
 /   / \   x              /         \     / \
a   d   f   j            a           g   i   l
         \
          g

Use methods 1 and 2 from the question to verify the tree rebalanced as required (ie, verify tree is still in order and balanced). If you wanted to be really sure, convert the visual test cases to include a list of depth and balance values of the final tree to verify during an inorder traversal.

If you really want to hammer your implementation, you should do some black-box tests with lots of different insertion-order and removal-order patterns. Here are some ideas that come to mind:

  • Random order
  • Increasing order
  • Decreasing order
  • Interleave two streams, one with increasing, one with decreasing order
    • Start with similar values and diverge
    • Start at the ends and meet in the middle
    • Start at the ends and cross to opposite ends
  • Random-walk with upwards, downwards and neutral biases
  • Mix up insertions and removals in combinations of the above patterns.

You should not only test correctness, but also performance, subject to the above patterns, which may require building up largish data sets, so that you can meaningfully measure the performance. Everything is fast with 100 elements, but with 105 elements, the difference between O(N2) and O(N log N) will be huge.

You should also test for bad inputs, e.g., adding or removing the same value twice (assuming you don't allow duplicates).

For insert and delete, there are a particular number (about five for each, I recall) of tree operations which can occur.

You need to set up a tree immediately prior to one of these operations such that adding another particular element will cause a known one of these operations to occur.

You then inspect the tree - dump it out. It'll be a fairly simple tree, no more than about ten elements.

If every insert/delete operation works correctly, you will have validated the vital core behaviour of your tree.

(Note, one of the (I think it was) insert operations cannot be checked in this way - it's an intermediate state which exists temporarily).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top