What goes wrong with sums of Landau terms?

https://cs.stackexchange.com/questions/366

16-10-2019
|

Question

I wrote

$\qquad \displaystyle \sum\limits_{i=1}^n \frac{1}{i} = \sum\limits_{i=1}^n \cal{O}(1) = \cal{O}(n)$

but my friend says this is wrong. From the TCS cheat sheet I know that the sum is also called $H_n$ which has logarithmic growth in $n$. So my bound is not very sharp, but is sufficient for the analysis I needed it for.

What did I do wrong?

Edit: My friend says that with the same reasoning, we can prove that

$\qquad \displaystyle \sum\limits_{i=1}^n i = \sum\limits_{i=1}^n \cal{O}(1) = \cal{O}(n)$

Now this is obviously wrong! What is going on here?

Solution

What you are doing is a very convenient abuse of notation.

Some pedants will say that what you write is nonsense, as $\mathcal{O}(f)$ denotes a set and you cannot do arithmetic operations on them the way you are doing.

But it is a good idea to ignore those pedants and assume that $\mathcal{O}(f)$ stands for some member of the set. So when we say $f(n) = g(n) + \mathcal{O}(n)$, what we really mean if that $f(n) - g(n) \in \mathcal{O}(n)$. (Note: some pedants might shudder at this statement too, claiming that $f(n)$ is a number and $f$ is the function!)

This makes it very convenient to write expressions like

$$ n \le \sum_{k=1}^n k^{1/k} \le n + \mathcal{O}(n^{1/3})$$

What this means is that there is some $f \in \mathcal{O}(n^{1/3})$ such that

$$ n \le \sum_{k=1}^n k^{1/k} \le n + f(n)$$

In your case of

$$\sum_{k=1}^{n} \frac{1}{k} = \sum_{k=1}^{n} \mathcal{O}(1) = \mathcal{O}(n)$$

you are abusing it even further and you need to be careful.

There are two possible interpretations here: Does the $\mathcal{O}(1)$ refer to a function of $n$, or a function of $k$?

I believe the right interpretation is to interpret it as a function of $k$.

If you try thinking of it as a function of $n$, thought not incorrect, it might lead to potential fallacies, like thinking $k$ is $\mathcal{O}(1)$ and trying to write $\sum_{k=1}^{n} k = \sum_{k=1}^{n} \mathcal{O}(1)$

If you try thinking of it as a function of $k$, then it is true that, if $f = \mathcal{O}(g)$ (as the argument goes to $\infty$) and $g$ is never $0$, that

$$ S(n) = \sum_{k=1}^{n} f(k) = \sum_{k=1}^{n} \mathcal{O}(g(k)) = \mathcal{O}(\sum_{k=1}^{n} |g(k)|)$$

Note that in the middle, we have used the convenient abuse of notation $\mathcal{O}(g(k))$ to mean that for some function $h \in \mathcal{O}(g)$ the sum is $\sum_{k=1}^{n} h(k)$. Note that the final function inside the $\mathcal{O}$ refers to a function of $n$. The proof is not that difficult, but you have to cater to the fact that you are dealing with an asymptotic upper bound (i.e. for sufficiently large arguments), but the sum starts right at $1$.

If you try thinking of it as a function of $n$, then it is also true that if $f = \mathcal{O}(g)$ (as the argument goes to $\infty$) then

$$ S(n) = \sum_{k=1}^{n} f(k) = \sum_{k=1}^{n} \mathcal{O}(g(n)) = \mathcal{O}(n g(n))$$

So your proof is essentially correct, in either interpretation.

OTHER TIPS

What you wrote is perfectly correct. The $n$th harmonic number is indeed in the set $O(n)$.

Proof: $\sum_{i=1}^n \frac{1}{i} \le \ln n + 1 \le 2n = O(n)$. $\square$

The upper bound $O(n)$ isn't tight, but it's correct.

For the second example, you can't claim that $$i = O(1)$$

since $i$ varies with $n$. After few steps it will be the case that $i>n/2$. A more appropriate way is to say that $i = O(n)$ since indeed, throughout the summation $i$ never exceeds $1\cdot n$. By this reasoning, $$\sum_{i=1}^n i = \sum_{i=1}^n O(n)= nO(n) = O(n^2)$$

However the right thing to do is actually use the big-O notation only at the end. Upper bound your summation as tight as you can, and only when your done use the asymptotic notations to avoid these pitfalls.

Licensed under: CC-BY-SA with attribution

Not affiliated with cs.stackexchange