Question

If I have an algorithm that takes n log n steps (e.g. heapsort), where the steps take log n time (e.g. comparison/exchange of "big" integers in the range 0 to n-1), what is the asymptotic bound for the whole process.

Obviously we can say "n (log n) (log n)", but I'm having a hard time trying to convince myself that I cannot simplify this to "n log n". At the same time, I'm having a hard time justifying the instinct that insists that I can.

Is my instinct just plain wrong on this?

EDIT

It seems my simple-example-to-avoid-complicating-the-issue has complicated the issue. Oh well.

The real reason for the question is that I often have a standard algorithm with a known complexity, but implemented using different underlying containers, so that the individual steps are O(log n) rather than O(1). For example, Hopcrofts automaton minimisation algorithm is O(n log n) - but if you start using binary tree containers for the states, transitions and intermediate results, the steps themselves become O(log n) - the O(n log n) is no longer valid because the assumption of O(1) accesses is invalid.

Still, people will claim that there are n states and m transitions, but n and m tend to be linearly related for automata, assuming that the number of transition annotations is constant and that the automata are more-or-less deterministic.

I haven't worried too much about this in the past - the cases I work with aren't very big. But, well, I'm doing a major refactoring of my automata code, and I thought I might as well do the math properly for some key algorithms as I go.

EDIT

I'm also increasingly convinced that the "n (log n) (log n)" is wrong.

If a is O(b log b) where b is O(log c), what is a in terms of c?

Was it helpful?

Solution

Here's a proof by contradiction:

Let's say that a function f(n) = n(log n)(log n). Assume that we think it's also theta(n log n), so in other words there is an a for which f(n) <= a * n log n holds for large n.

Now consider f(2^(a+1)):

f(2^(a+1)) = 2^(a+1) * log(2^(a+1)) * log(2^(a+1)) = 2^(a+1) * log(2^(a+1)) * (a+1), which is clearly larger than a * 2^(a+1) * log(2^(a+1)), and we have a contradiction. therefore f(n) is not an n log n function.

OTHER TIPS

In general, you can't multiply complexities like this: for heap sort, N indicates the number of items in the heap, whereas for the big integers, N probably indicates the upper bound of possible values. In general, these don't have to be related, so that it's rather N log N log M (where M is the range that the items may take).

In a specific application, most likely, the large integers follow some specific distribution. For example, it may be known that they are all below 10^20. If so, the comparison operations take constant time (determined by an upper bound of 10^20). Then, log M is also constant, and the entire complexity is in O(N log N).

You won't be able to reduce n (log n) (log n) to n (log n) simply because that's not a reduction by a constant factor.

What's wrong with n (log n)2?

Okay, the general complexity measure of a program is the following:

Complexity O(f(n)) means, that there exists c, such that the number of the corresponding Turing machine steps before it terminates is not more than c*f(n), where n is the length of input.

In this definition everything is taken into account, because the integer numbers may be arbitrarily big, and arithmetic operations on them would also increase the function under O(n).

But if we were programming Turing machines directly, your question wouldn't have been arisen. In real world we prefer to abstract. While we still calculate "steps" that are needed to run the program, we assume that certain operations take one step. We assume that by different reasons:

  • It may resemble the actual work of the CPU, where each 32-bit integer addition is indeed one step (and there exist algorithms that actually abuse it, e.g. "bit-verctor dominators").
  • We compare different algorithms in same domain (for example, comparing array sortings by measuring the number of comparisons).

In this case (your first EDIT), if you want to concretize your complexity measure, you should just multiply functions under O. If what you thought to take one step now considered to take O(log N) steps, then the amount of concretized steps increases by a factor of O(log N). Therefore the total complexity is O(Nlog Nlog N).


As to your second EDIT, it's a different situation. Let your algorithm be a complexity of O(nlog N). But you know that the input consists of M numbers, each of log K digits, so `N = O(Mlog K) (we need to account separators, etc). It's mathematically correct then to write the overall complexity as O(M * log K * (log M + log log K)), so there's no problem here. But usually we abstract away unnecessary details to find a common basis for different algorithms to be compared.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top