Estimating run time of algorithm

https://softwareengineering.stackexchange.com/questions/274778

07-10-2020
|

Вопрос

For instance if i have an algorithm that is O(n²) and it will run for 10 seconds for a problem size of 1000. Now if i were to double the problem size to 2000 i want to know the approximate run time in seconds. I know the answer for it but i want to understand the logic on how to get to the answer. So here is what i am thinking.

N = 1000 , Therefore 1000^2 = 10 seconds
N = 2000,  Therefore (2*1000)^2, // stuck here

Now i am not sure i mean i know it will be 40 seconds because 2 to the power of 2 is 4 and then you multiply the 10 seconds by 4 and get 40 seconds but not entirely sure if this is right thinking. Was wondering if someone can break it down in simple terms?

Решение

Complexity classes such as O(n) or O(n²) are not meant to calculate actual running time. Instead, they are meant to compare how different algorithms scale when we change the input size.

For example, here we have two algorithms that apply frobnicate(a, b) to each matching item:

void algorithm1(Set<int> items) {
  for (int i in items) {
    for (int j in items) {
      if (i == j) {
        frobnicate(i, j);
      }
    }
  }
}

void algorithm2(Set<int> items) {
  for (int i in items) {
    frobnicate(i, i);
  }
}

Obviously, the 1st algorithm has O(n²) complexity while the second is O(n). Therefore, we can conclude that algorithm2 scales better for large inputs. But here is another O(n) algorithm:

void algorithm3(Set<int> items) {
  sleep(9 seconds);
  for (int i in items) {
    sleep(1 millisecond);
    frobnicate(i, i);
  }
}

This scales just as well as algorithm2 but will always be slower! In fact, the O(n²) algorithm1 will outperform this algorithm for small inputs! Because algorithm3 has a fixed 9-second overhead, we can't plug different numbers into the O(n) complexity formula to find out how long a different input size will take.

Example:

n=1000 => algorithm3 takes 10 seconds
n=2000 => algorithm3 takes 11 seconds, not 20 seconds!

In reality, the running time is not possible to reliably calculate. Even if you can figure out a cost for each atomic operation, hardware quirks such as cache boundaries will get in your way. The only reliable option is to do statistical sampling over different input sizes, and then fit an estimation function matching the expected complexity class to the collected data.

Let's assume we have a run time estimation function (!= complexity class) of f(n) = a·n² + b·n + c, and let's assume that b = c = 0 for simplicity (i.e. no linear factors, no constant overhead). Then, we can calculate the parameter a from the data point f(1000) = 10 seconds:

10 seconds = f(1000)
           = a·1000²
           = a·1000000
 <=> a = 10/1000000 seconds = 1/100000 seconds

Therefore, f(n) = n²/100000 seconds. If we plug in n = 2000, we get

f(2000) = 2000²/100000 seconds
        = 4000000/100000 seconds
        = 40 seconds

This is consistent with the shortcut that if we multiply the input size by n (here n = 2), the output is increased by n² (here n² = 4).

Другие советы

To make it short:

The first sentence of amon's answer is important.

But your guess is correct, too: 1000^2 is 4 times more than (2*1000)^2. I.e. if 1000^2 are equal to 10 seconds on a given hardware, then (2*1000)^2 are equal to 40 seconds. It's simply the rule of three.

One more note: If you are dealing with the running time of the implementation of an algorithm, you should also test the performance of your implementation. Simple programming "mistakes" can make a huge difference in real world.

Лицензировано под: CC-BY-SA с атрибуция

Не связан с softwareengineering.stackexchange