Sum rule for Big-O with equal complexity-functions?

https://cs.stackexchange.com/questions/128942

29-09-2020
|

Question

One property of the Big-O-notation is the sum rule, which states that when I have two functions $f_1$ and $f_2$ and their corresponding complexity functions are $g_1$ and $g_2$, then the combined complexity is $f_1 + f_2 = O(\max(g_1, g_2))$.

But what do we pick if both complexity-functions are equal? E.g., if $f_1(n)$ is the sorting of an array and $_f2(m)$ as well, then the complexities are $O(n\log(n))$ and $O(m\log(m))$. Applying the rule would be $O(\max(n\log(n), m\log(m)))$. I think that picking any of those would yield a valid but very unintuitive estimation as you would drop one variable. Besides that, it's not clear which one to pick.

Is it the case that you are not supposed to use the rule when there are multiple variables involved?

Solution

It depends on if some variables on dependent on another, or if they are given as parameters in the problem. For example, in many graph-related problems, $n$ can be the number of vertices, and $m$ can be the number of edges. In this case, $m$ can be as large as $O(n^2)$. So in the general case, say if we have an algorithm whose first phase runs in $O(n)$ and second phase runs in $O(m)$, we just add the two terms together and do not attempt to simplify --- the final runtime is $O(n+m)$.

In terms of several variables given as parameters that aren't necessarily related: eaving multiple variables inside the final equation is standard, e.g. I can say that multiplying an $m\times n$ matrix and an $n\times p$ matrix takes $O(mnp)$-time using schoolbook multiplication, or solving the knapsack problem takes $O(nt)$-time where $n$ is the number of items and $t$ is the target sum.

In special cases like planar graphs and other sparse graphs, we know that $m=O(n)$, so we can safely substitute $n$ in place of $m$ in the final running time for simplification.

As a digression, this illustrates why on sparse graphs (where $m=O(n)$) one would prefer to use Dijktra's in a loop for all-pairs shortest path (in contrast to a transitive-closure based algorithm like Floyd-Warshall); since now Dijkstra runs in $O(m \log n) = O(n \log n)$, the overall complexity for using Dijkstra's in a loop becomes $O(n^2 \log n)$, which is better than Floyd-Warshall. By contrast, note that Dijkstra's runs in $m \log n$ for the general case, which means on dense graphs where $m=O(n^2)$, it becomes less efficient than Floyd-Warshall in terms of worst-case running time.

In other cases, there can be variables aren't even polynomially bounded by a given parameter --- take the knapsack example above, where $t$ can be exponential in $n$.

OTHER TIPS

There is no general answer. By the way, if you want to simplify the $O(\max(n\log(n), m\log(m)))$, you can rewrite it as as $O((n+m) \log(n+m))$.

Moreover, if $g_1$ and $g_2$ are eqaul and increasing functions, you can write $O(g_1(n+m))$ instead of $O(\max(g_1(n), g_2(m)))$.

By definition $O(g)$ should be always defined variable and limit point with respect to is considering $O$. For example in $O(n^3), n \to \infty$ variable is $n$ and limit point $\infty$. In $O(x^3), x \to 0$ variable is $x$ and limit point $0$.

So when we write $f=O(g)$, then formally we mean some variable and some limit point. For example $f(n)=O(g(n)), n \to \infty$ is exact record. Note, that record $f(m)=O(g(m)), m \to \infty$ means exactly same as previous sentence.

When we are speaking about property $f_1 + f_2 = O(\max(g_1, g_2))$, then here also should be stated some variable(s) and some limit point. So exact record should be something like $$f_1(n) + f_2(n) = O(\max(g_1, g_2)(n)), n \to \infty$$ Or we need more clarification with respect to variables, assuming limit point $\infty$.

Licensed under: CC-BY-SA with attribution

Not affiliated with cs.stackexchange