How does one know which notation of time complexity analysis to use?

https://cs.stackexchange.com/questions/57

16-10-2019
|

Pergunta

In most introductory algorithm classes, notations like $O$ (Big O) and $\Theta$ are introduced, and a student would typically learn to use one of these to find the time complexity.

However, there are other notations, such as $o$, $\Omega$ and $\omega$. Are there any specific scenarios where one notation would be preferable to another?

Solução

You are referring to the Landau notation. They are not different symbols for the same thing but have entirely different meanings. Which one is "preferable" depends entirely on the desired statement.

$f \in \cal{O}(g)$ means that $f$ grows at most as fast as $g$, asymptotically and up to a constant factor; think of it as a $\leq$. $f \in o(g)$ is the stricter form, i.e. $<$.

$f \in \Omega(g)$ has the symmetric meaning: $f$ grows at least as fast as $g$. $\omega$ is its stricter cousin. You can see that $f \in \Omega(g)$ is equivalent to $g \in \cal{O}(f)$.

$f \in \Theta(g)$ means that $f$ grows about as fast as $g$; formally $f \in \cal{O}(g) \cap \Omega(g)$. $f \sim g$ (asymptotic equality) is its stronger form. We often mean $\Theta$ when we use $\cal{O}$.

Note how $\cal{O}(g)$ and its siblings are function classes. It is important to be very aware of this and their precise definitions -- which can differ depending on who is talking -- when doing "arithmetics" with them.

When proving things, take care to work with your precise definition. There are many definitions for Landau symbols around (all with the same basic intuition), some of which are equivalent on some sets on functions but not on others.

Big O: upper bound

“Big O” ($O$) is by far the most common one. When you analyse the complexity of an algorithm, most of the time, what matters is to have some upper bound on how fast the run time¹ grows when the size of the input grows. Basically we want to know that running the algorithm isn't going to take “too long”. We can't express this in actual time units (seconds), because that would depend on the precise implementation (the way the program is written, how good the compiler is, how fast the machine's processor is, …). So we evaluate what doesn't depend on such details, which is how much longer it takes to run the algorithm when we feed it bigger input. And we mainly care when we can be sure that the program is done, so we usually want to know that it will take such-and-such amount of time or less.

To say that an algorithm has a run time of $O(f(n))$ for an input size $n$ means that there exists some constant $K$ such that the algorithm completes in at most $K \, f(n)$ steps, i.e. the running time of the algorithm grows at most as fast as $f$ (up to a scaling factor). Noting $T(n)$ the run time of the algorithm for input size $n$, $O(n)$ informally means that $T(n) \le f(n)$ up to some scaling factor.

Lower bound

Sometimes, it is useful to have more information than an upper bound. $\Omega$ is the converse of $O$: it expresses that a function grows at least as fast as another. $T(n) = \Omega(g(n))$ means that $T(N) \ge K' g(n)$ for some constant $K'$, or to put it informally, $T(n) \ge g(n)$ up to some scaling factor.

When the running time of the algorithm can be determined precisely, $\Theta$ combines $O$ and $\Omega$: it expresses that the rate of growth of a function is known, up to a scaling factor. $T(n) = \Theta(h(n))$ means that $K h(n) \ge T(n) \ge K' h(n)$ for some constants $K$ and $K'$. Informally speaking, $T(n) \approx h(n)$ up to some scaling factor.

Further considerations

The “little” $o$ and $\omega$ are used far less often in complexity analysis. Little $o$ is stronger than big $O$; where $O$ indicates a growth that is no faster, $o$ indicates that the growth is strictly slower. Conversely, $\omega$ indicates a strictly faster growth.

I've been slightly informal in the discussion above. Wikipedia has formall definitions and a more mathematical approach.

Keep in mind that the use of the equal sign in $T(n) = O(f(n))$ and the like is a misnomer. Strictly speaking, $O(f(n))$ is a set of functions of the variable $n$, and we should write $T \in O(f)$.

Example: some sorting algorithms

As this is rather dry, let me give an example. Most sorting algorithms have a quadratic worst case run time, i.e. for an input of size $n$, the run time of the algorithm is $O(n^2)$. For example, selection sort has an $O(n^2)$ run time, because selecting the $k$th element requires $n-k$ comparisons, for a total of $n(n-1)/2$ comparisons. In fact, the number of comparisons is always exactly $n(n-1)/2$, which grows as $n^2$. So we can be more precise about the time complexity of selection sort: it is $\Theta(n^2)$.

Now take merge sort. Merge sort is also quadratic ($O(n^2)$). This is true, but not very precise. Merge sort in fact has a running time of $O(n \: \mathrm{lg}(n))$ in the worst case. Like selection sort, merge sort's work flow is essentially independent of the shape of the input, and its running time is always $n \: \mathrm{lg}(n)$ up to a constant multiplicative factor, i.e. it is $\Theta(n \: \mathrm{lg}(n))$.

Next, consider quicksort. Quicksort is more complex. It is certainly $O(n^2)$. Furthermore, the worst case of quicksort is quadratic: the worst case is $\Theta(n^2)$. However, the best case of quicksort (when the input is already sorted) is linear: the best we can say for a lower bound to quicksort in general is $\Omega(n)$. I won't repeat the proof here, but the average complexity of quicksort (the average taken over all possible permutations of the input) is $\Theta(n \: \mathrm{lg}(n))$.

There are general results on the complexity of sorting algorithms in common settings. Assume that a sorting algorithm can only compare two elements at a time, with a yes-or-no result (either $x \le y$ or $x > y$). Then it is obvious that any sorting algorithm's running time is always $\Omega(n)$ (where $n$ is the number of elements to sort), because the algorithm has to compare every element at least once to know where it will fit. This lower bound can be met, for example, if the input is already sorted and the algorithm merely compares each element with the next one and keeps them in order (that's $n-1$ comparisons). What is less obvious is that the maximum running time is necessarily $\Omega(n \: \mathrm{lg}(n))$. It's possible that the algorithm will sometimes make fewer comparisons, but there has to be some constant $K$ such that for any input size $n$, there is at least one input on which the algorithm makes more than $K n \mathrm{lg}(n)$ comparisons. The idea of the proof is to build the decision tree of the algorithm, i.e. to follow the decisions the algorithm takes from the result of each comparison. Since each comparison returns a yes-or-no result, the decision tree is a binary tree. There are $n!$ possible permutations of the input, and the algorithm needs to distinguish between all of them, so the size of the decision tree is $n!$. Since the tree is a binary tree, it takes a depth of $\Theta(\mathrm{lg}(n!)) = \Theta(n\:\mathrm{lg}(n))$ to fit all these nodes. The depth is the maximum number of decisions that the algorithm takes, so running the algorithm involves at least this many comparisons: the maximum running time is $\Omega(n \: \mathrm{lg}(n))$.

¹ _{Or other resource consumption such as memory space. In this answer, I only consider running time.}

Typically $O$ is used for stating upper-bounds (an estimate from above), while $\Omega$ is used to state lower-bounds (an estimate from below), and $\Theta$ is used when they match, in which case you can use $\Theta$ in place of them (usually) to state the result.

Licenciado em: CC-BY-SA com atribuição

Não afiliado a cs.stackexchange