Question

How do you compute the average height of a binary search tree when adding 1000 random ints? What is the average height?

Was it helpful?

Solution

You can compute the height of a binary tree using this recursive definition:

height(empty) = 0
height(tree) = 1 + max(height(tree.left), height(tree.right))

One way to empirically measure the average height of such a tree is to repeatedly create an empty tree and add 1000 random items to it. Measure the height of each trial using the above function, and average them.

I suspect your task is probably to find a formula for the average height of a binary tree.

OTHER TIPS

This question got me asking whether you can definitively work this out without actually generating the trees.

I managed to write an application which could calculate the answer to what the average height would be if you added every possible permutation of N unique numbers to a naively implemented binary tree.

The answers I got are in this graph. (The X-axis is the number of items in the tree, the blue line is the average height, and the red line is the optimum possible height)

Graph of average height to minimum height

N     Average Height
2     2
16    7.039
32    9.280
64    11.679
256   16.783
343   17.896

Granitebolshevik is right: it's possible but statistically unlikely that a naively implemented tree will be the optimum height, without extra balancing functionality.

The algorithm has a complexity of O(N^2), and it isn't fast enough to calculate really large numbers.

It depends on whether you are using any sort of balanced tree structure (such as a red-black tree). Since you are inserting random numbers into a binary tree, it would be reasonable to expect that the average depth is approximately log2(1000) - so the values 10 and 11 would be 'normal'. I'm not sure how far it could deviate from that; no shallower than 10 levels, possibly somewhat deeper. An extreme case with no balancing would be 1000 deep; that is unlikely to happen with random numbers.

There doesn't appear to be a simple answer to this question, though there are a number of numeric approximations, e.g.:

Devroye, Luc. "A note on the height of binary search trees." Journal of the ACM (JACM) 33.3 (1986): 489-498.

Reed, Bruce. "The height of a random binary search tree." Journal of the ACM (JACM) 50.3 (2003): 306-332.

http://staff.ustc.edu.cn/~csli/graduate/algorithms/book6/chap13.htm

These approximations generally take the form: A ln n - B ln ln n + C

Where A~4.311 and B~1.953

So probably the most useful thing to say is that the average height for random insertions is O(log n), but if you actually need a numeric approximation I think (4.311 ln n - 1.953 ln ln n) would be close enough for large n.

For n=1000, that gives about 26 - which fits the experimental results reported elsewhere quite nicely.

This question is in fact, tricky. The answer will not be 1000, because that is improbable, but log2(1000) is also improbable, but even more so depending on how the tree is grown.

If you add an int by stepping though the tree then naively appending it the tree will virtually always be taller than log2(1000).

Talk to a statistician, because this seems related to normal probability distributions. Those are generated by lots of iterated random events( heads one unit right, tails ditto left), and the value of a random integer iterates through the tree as it settles out into a leaf.

it depends on the order the are added. If you start with the smallest value then the tree will be deeper because all new values will be added to the right child BST. If you add the largest value first then the left child will be deep while the right is empty.

Regardless of what tree you are using the average height will be log2(1000), as someone mentioned before. It's true that depending on the order of the numbers inserted the actual height may vary, but assuming randomly distributed numbers, which you mention, then the actual value will, more often than not, approximate the expected value (Which, once again, is log2(1000))

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top