Algorithmic intuition for logarithmic complexity

https://cs.stackexchange.com/questions/581

16-10-2019
|

Pergunta

I believe I have a reasonable grasp of complexities like $\mathcal{O}(1)$, $\Theta(n)$ and $\Theta(n^2)$.

In terms of a list, $\mathcal{O}(1)$ is a constant lookup, so it's just getting the head of the list. $\Theta(n)$ is where I'd walk the entire list, and $\Theta(n^2)$ is walking the list once for each element in the list.

Is there a similar intuitive way to grasp $\Theta(\log n)$ other than just knowing it lies somewhere between $\mathcal{O}(1)$ and $\Theta(n)$?

Solução

The $\Theta(\log n)$ complexity is usually connected with subdivision. When using lists as an example, imagine a list whose elements are sorted. You can search in this list in $\mathcal{O}(\log n)$ time - you do not actually need to look at each element because of the sorted nature of the list.

If you look at the element in the middle of the list and compare it to the element you search for, you can immediately say whether it lies in the left or right half of the array. Then you can just take this one half and repeat the procedure until you find it or reach a list with 1 item which you trivially compare.

You can see that the list effectively halves each step. That means if you get a list of length $32$, the maximum steps you need to reach one-item list is $5$. If you have a list of $128 = 2^7$ items, you need only $7$ steps, for a list of $1024 = 2^{10}$ you need only $10$ steps etc.

As you can see, the exponent $n$ in $2^n$ always shows the number of steps necessary. Logarithm is used to "extract" exactly this exponent number, for example $\log_2 2^{10} = 10$. It also generalizes to list lengths that are not powers of two long.

Outras dicas

In terms of (balanced) trees (say, binary tree, so all $\log$'s are base 2):

$\Theta(1)$ is getting the root of the tree
$\Theta(\log n)$ is a walk from root to leaf
$\Theta(n)$ is traversing all nodes of the tree
$\Theta(n^2)$ is actions on all the subsets two nodes in the tree, e.g., the number of different paths between any two nodes.
$\Theta(n^k)$ - generalization of the above for any subset of $k$ nodes (for a constant $k$)
$\Theta(2^n)$ is actions on all the possible subsets of nodes (subsets of all the possible sizes, i.e., $k=1,2, \ldots, n$.). For instance, the number of different sub-trees of the tree.

For $O(\log n)$ to be possible, you need to be able to cut down the problem size proportionally by some arbitrary amount with respect to $n$ with a constant time operation.

For example, in the case of binary search, you can cut down the problem size by a half with each comparison operation.

Now, do you have to cut down the problem size by half, actually no. An algorithm is $O(\log n)$ even if it can cut down the problem search space by 0.0001%, as long as the percentage and the operation it uses to cut down the problem size stays constant, it's a $O(\log n)$ algorithm, it won't be a fast algorithm, but it's still $O(\log n)$ with a very large constant. (Assuming we are talking about $\log n$ with a base 2 log)

Think about algorithm to convert decimal number $n$ to binary

while n != 0:
   print n%2, 
   n = n/2

This while loop runs $\log(n)$ times.

Yes, $\log(n)$ is between $1$ and $n$, but it is closer to $1$ than $n$. What is $\log(n)$? The log function is the inverse function of exponentation. Let me start with exponentation and you should get a better idea of what logarithm is.

Consider two numbers, $100$ and $2^{100}$. $2^{100}$ is $2$ multiplied with itself $100$ times. You can with some effort count $100$ numbers, but can you count $2^{100}$? I bet you can't. Why? $2^{100}$ is such a big number that it is greater than the number of all atoms in the universe. Reflect on that for a moment. It is such a huge number, that it allows you to give each atom a name (number). And the number of atoms in your finger nail is probably in the order of billions. $2^{100}$ ought to be enough for anyone (pun intended :)).

Now, between the two numbers, $100$ and $2^{100}$, $100$ is the logarithm of $2^{100}$ (in base $2$). $100$ is comparatively such a small number than $2^{100}$. Anybody ought to have $100$ different items in their home. But, $2^{100}$ is good enough for the universe. Think home vs universe when thinking of $\log(n)$ and $n$.

Where do exponentation and logarithms come from? Why are they of so much interest in computer science? You may not notice, but exponentation is everywhere. Did you pay interest on credit card? You just paid a universe for your home (Not so bad, but the curve fits). I like to think that exponentation comes from product rule, but others are welcome to give more examples. What's product rule, you may ask; And I shall answer.

Say you have two cities $A$ and $B$, and there are two ways to go between them. What is the number of paths between them? Two. That is trivial. Now say, there is another city $C$, and you can go from $B$ to $C$ in three ways. How many paths are there between $A$ and $C$ now? Six, right? How did you get that? Did you count them? Or did you multiply them? Either way, it is easy to see that both ways give a similar result. Now if you add a city $D$ which can be reached from $C$ in four ways, how many ways are there between $A$ and $D$? Count if you don't trust me, but it is equal to $2\cdot 3\cdot 4$ which is $24$. Now, if there are ten cities and there are two paths from one city to the next, and they are arranged like they are on a straight line. How many paths are there from start to end? Multiply them if you don't trust me, but I will tell you there are $2^{10}$, which is $1024$. See that $2^{10}$ is exponential result of $10$, and $10$ is the logarithm of $2^{10}$. $10$ is a small number compared to $1024$.

The logarithm function $\log_2(n)$ is to $n$ what $n$ is to $2^n$ (note that $2$ is the logarithm's base). If you multipy $\log_b(n)$ with itself $b$ times (note that $b$ is the logarithm's base) you get $n$. $\log(n)$ is so tiny, so small compared with $n$, that it is size of your home where $n$ is size of the universe.

On a practical note, $\log(n)$ functions perform very similar to constant functions. They do grow with $n$, but they grow very slowly. If you optimized a program to run in logarithmic time which was taking a day before, you will probably run it in the order of minutes. Check for yourself with problems on Project Euler.

To continue your theme, $O(\log n)$ is like repeatedly guessing where $x$ lies in the list and being told "higher" or "lower" (in terms of index).

It's still based on the size of the list, but you only need to visit a fraction of the elements.

If we have a divide and conquer algorithm, and we make only one recursive call for a subproblem, and it is the second case in Master theorem, i.e. the time complexity of the non-recursive part is $\Theta(\lg^k n)$, then the complexity of the algorithm will be $\Theta(\lg^{k+1} n)$.

In other words, when we have a divide and conquer algorithm with one recursive call to itself on a problem with size a constant factor of the current problem, and the time in the non-recursive part is $\Theta(1)$ (constant), then the running time of the algorithm is going to be $\lg n$.

The binary search algorithm is the classical example.

The intuition is how many times you can halve a number, say n, before it is reduced to 1 is O(lg n).

For visualizing, try drawing it as a binary tree and count the number of levels by solving this geometric progression.

2^0+2^1+...+2^h = n

Licenciado em: CC-BY-SA com atribuição

Não afiliado a cs.stackexchange