Question

In real world applications is there a concrete benefit when using $\mathcal{O}(\log(\log(n))$ instead of $\mathcal{O}(\log(n))$ algorithms ?

This is the case when one use for instance van Emde Boas trees instead of more conventional binary search tree implementations. But for example, if we take $n < 10^6$ then in the best case the double logarithmic algorithm outperforms the logarithmic one by (approximately) a factor of $5$. And also in general the implementation is more tricky and complex.

Given that I personally prefer BST over VEB-trees, what do you think ?

One could easily demonstrate that :

$\qquad \displaystyle \forall n < 10^6.\ \frac{\log n}{\log(\log(n))} < 5.26146$

Was it helpful?

Solution

Do not forget that $\log n$ still grows exponentially (in $\log(n)$) faster than $\log(\log n)$!

Indeed, if you look at the quotient of $\log(n)$ and $\log(\log(n))$, there is not much impressive to see:

log(n)/log(log(n))
[source]

But still, you get a factor five to six for sizes up to $100000$. Note that larger sizes are not uncommon in practice, and a speedup by that factor is awesome! It may make the difference between having results after lunch or only tomorrow. Be aware that part of the speedup may be eaten away by higher constants of the tree implementation; you would have to plot (or analyse) $c\cdot \log(n)$ and $d\cdot \log(\log(n))$ with $c,d$ the actual runtime constants to get a real picture.

Additionally, what Dave mentions is important: if the operation sped up thusly is executed, say, linearly often, constant speedups become linear speedups, i.e. you may decrease the leading constant of your entire algorithm! As I said above, that is awesome. Just look at what happens if you run the operation $n$ times:

n*log(n)/(n*log(log(n)))
[source]

Now if that's not worth the trouble I don't know what.

OTHER TIPS

One could imagine that the difference in complexity really does not matter so much, and that the actual run-time is more important. But if the algorithm is at the core of another algorithm, then this difference may be important.

From a purely theoretical purpose, the difference of course does matter, especially if the algorithm is a part of another. It may put the larger algorithm in a different complexity class.

I've actually benchmarked the van Emde-Boas tree myself once. I compared it with an AA Tree, a hashmap and a bit array.

The tests perform size inserts with random numbers in the interval [0, bound], then size searches, then size deletes and then again size searches. Deletes are done on random numbers as well, so you first have to figure out if they are in the structure at all.

Here are the results (size=2000000, bound=10000000) in seconds:

AATreeLookup - O(n log n)
Inserting... 3.3652452
Searching... 5.2280724
Deleting...  7.3457427
Searching... 9.1462039
HashLookup - O(n) expected
Inserting... 0.3369505
Searching... 0.6223035
Deleting...  0.9062163
Searching... 1.1718223
VanEmdeBoasTree - O(n log log n)
Inserting... 0.7007531
Searching... 1.1775800
Deleting...  1.7257065
Searching... 2.2147703
ArrayLookup - O(n)
Inserting... 0.0681897
Searching... 0.1720300
Deleting...  0.2387776
Searching... 0.3413800

As you can see, van Emde-Boas trees are about twice as slow as hash maps, ten times as slow as bit arrays, and 5 times as fast as binary search trees.

Of course the above needs a disclaimer: the tests are artificial, you can possibly improve the code or use a different language with a compiler whose output is faster, and so on and so forth.

This disclaimer is at the heart of the reason we use asymptotic analysis in algorithms design: as you have no idea what the constants are and as the constants can change depending on environmental factors, the best we can do is an asymptotic analysis.

Now, in the case of $\log n$ versus $\log \log n$: in the above example, my van Emde-Boas tree is able to contain $2^{32}$ elements. $\log 2^{32} = 32$, and $\log 32 = 5$, which is a factor 6 improvement, which is quite a bit in practice. Additionally, van Emde-Boas trees have good constant factors (it's all about constant factors in practice for differences this small) as they don't need to balance themselves.

Licensed under: CC-BY-SA with attribution
Not affiliated with cs.stackexchange
scroll top