Complexity of search operation on a nedtrie (bitwise trie)

https://stackoverflow.com/questions/4338829

30-09-2019
|

Question

I recently heard about nedtries and decided to try implementing them, but something bothers me about the complexity of their search operation; I can't stand why they are supposed to be so fast.

From what I understood, the expected complexity of their search operation should be O(m/2) with m the size of the key in bits. If you compare it to the complexity of the search operation in a traditional binary tree, you get: log2(n) >= m/2

Let's the key be 32bits long: log2(n) >= 16 <=> n >= 65536

So nedtries should be faster than binary trees starting from 65536 items. However, the author claim they are always faster than binary tree, so either my assumption on their complexity is wrong or the computations performed at each step of the search are vastly faster in a nedtrie.

So, what about it?

Solution

If you have smaller trees, you can use smaller keys!

OTHER TIPS

(Note I'm the author of nedtries). I think my explanation of complexity on the front of the nedtries page makes sense? Perhaps not.

The key you're missing is that it's the difference between bits which determines complexity. The more the difference, the lower the search cost, whereas the lower the difference, the higher the search cost.

The fact this works stems from modern out-of-order processors. As a gross simplification, if you avoid main memory your code runs about 40-80x faster than if you are dependent on main memory. That means you can execute 50-150 ops in the time it takes to load a single thing from memory. That means you can do a bit scan and figure out what node we ought to go look at next in not much longer than the time it takes to load the cache line for that node into memory.

This effectively removes the logic, the bit scanning and everything else from the complexity analysis. They could all be O(N^N) and it wouldn't matter. What matters now is that the selection of the next node to look at is effectively free, so it's the number of nodes which must be loaded for examination is the scaling constraint and therefore it's the average number of nodes looked at out of the total number of nodes which is its average complexity, because main memory's slowness is by far the biggest complexity constraint.

Does this make sense? It means weirdnesses like if some bits are densely packed at one end of the key but loosely packed at the other end of the key, searches in the densely packed end will be very considerably slower (approaching O(log N) where N is the number of dense elements) than searches in the loosely packed end (approaching O(1)).

Someday soon I'll get round to adding new functions which take advantage of this feature of bitwise tries, so you can say "add this node to a loosely/densely packed space and return the key you chose" and all sorts of variations on that theme. Sadly, as always, it comes down to time and demands on one's time.

Niall

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow