Optimality of Binary Search

https://stackoverflow.com/questions/4571478

14-10-2019
|

문제

This may be a silly question, but does anyone know of a proof that binary search is asymptotically optimal? That is, if we are given a sorted list of elements where the only permitted operation on those objects is a comparison, how do you prove that the search can't be done in o(lg n)? (That's little-o of lg n, by the way.) Note that I'm restricting this to elements where the only operation permitted operation is a comparison, since there are well-known algorithms that can beat O(lg n) on expectation if you're allowed to do more complex operations on the data (see, for example, interpolation search).

해결책

From here:

The number of possible outcomes should be at least O(n).
You can represent the decisions done by the algorithm by nodes of a "decision tree": if the items compare greater then it goes on way, if not it goes the other way. The nodes of the tree are the states of the algorithm and the leaves are the outcomes. So there should be at least O(n) nodes in the tree.
Each tree on O(n) nodes has at least O(log N) levels.

다른 팁

The logic is simple. Let's assume we have array of n different sorted elements.

There're 2 possible outcomes of 1 comparison (first element is smaller or greater). Thus, if one comparison is enough, n <= 2. Otherwise, if we have 3 elements (a, b, c) and our algorithm has 2 possible outcomes, one of 3 elements will never be selected.
There're 4 possible outcomes of 2 comparisons. Thus, if 2 comparisons is enough, n <= 4.
Likewise, for k comparisons to be enough n should be n <= 2^k.

Reverse the last inequality and you'll have logarithm: k >= log(2, n).

As Nikita described it's impossible to have something always better than O(log n).

Even if you allowed to do some additional operations, it's still not enough - I'm sure it's possible to prepare elements sequence where interpolation search will perform worse than binary search.

We can say interpolation search is better only because:

We consider average performance, not worst case.
Probability of each possible incoming data set is non-uniform.

So the answer is - it all depends on the additional knowledge we have about incoming data sets.

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow