What's the space complexity of a radix tree?

Question 1

You asked for space complexity, so let's work it out.

If we consider a non-null pointer at the leaf to be a value of interest, then it is not hard to prove by contradiction that the worst case is a fully populated tree with one value per leaf node.

If branching is N-way (in your use case 64) and height is H (in your use case 5), there are N^(H-1) leaf nodes in this tree, storing an equal number of values. The total number of nodes is

1 + N + N^2 + ... N^(H-1) = (N^H - 1) / (N-1)

So the storage requirement measured in pointers is N times this amount.

(N^H - 1)  [N / (N-1)]

This yields a storage efficiency of

(N^H - 1)  [N / (N-1)]  
--------------------
       N^(H-1)

This is the total number of pointers divided by the count of valid data pointers.

As N gets bigger, this approaches N. In your example use case, it's actually 65.01 (for N=64). So we can say the storage complexity is O(NV) where V is the number of data values to be stored.

Though we got here with first-principles analysis, it makes total sense. Storage for the leaf level of the complete tree dominates the rest by a factor of nearly N. The size of that storage is NV.

Of course the advantage of trees with enormous branching factors like this (and e.g. B-trees in databases) is that fewer node traversals are needed to get to the right leaf.

Moreover, when each traversal is a single array lookup as in the radix tree, you can't get much faster.

In your use case, a perfectly balanced binary search tree would require up to 30 comparisons with attendant branches flushing the pipeline. This compared to 5 array indexing operations could be much slower. Array indexing tends to be faster than comparison because it's non-branching code. But even if they are the same, the binary tree would only need 2^5=32 elements to cause the same amount of indexing work as a radix tree containing 2^30 elements.

To generalize this, a binary tree of 2^H elements will require the same lookup effort as an index tree capable of holding N^(H-1) elements if key comparsions and array index operations have the same cost.

As others have said, if the index bits for the top levels of the tree tend to a few common prefixes (i.e. they are the top bits of addresses of the same VM space), the worst case storage behavior of the radix tree does not occur.

Question 2

Radix tree are used a lot of holding long strings with a common/shared prefixes. in this case the radix tree will be much more economical.

For the sort of data you're specifying it's a different story.

Edit

A good example for long strings with prefixes is storing all file names with full path on your computer. With such data, it will be more economical than the alternatives and be very fast for finding if a file name exists or not. Might even be faster in some cases than a hash table.

Look at these 2 files:

"c:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\include\streambuf"
"c:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\include\string"

Their shared prefix is: "c:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\include\str", which is stored only once.

Question 3

The radix tree in Linux originally appeared as a data structure to support the page cache, where such distributions of keys (file offsets) are uncommon.

(FWIW, the initial variant used a splay tree, but Linus said no :)

The radix tree is wide and shallow, so a lookup in it accesses comparatively few different cache lines, which is, obviously, quite beneficial for performance.

It also has the property that locality in page cache accesses means locality in radix tree node accesses, unlike alternative designs like hash table, for example.