What is the meaning of the 90/10 rule of program optimization?

https://softwareengineering.stackexchange.com/questions/334528

31-12-2020
|

Question

According to Wikipedia, the 90 / 10 rule of program optimization states that “90% of a program execution time is spent in executing 10% of the code” (see the second paragraph here).

I really don't understand this. What exactly does this mean? How can 90% of the execution time be spent only executing 10% of the code? What about the other 90% of the code then? How can they be executed in just 10% of the time?

Solution

There are two basic principles in play here:

Some code is executed much more often than other code. For example, some error handling code might never be used. Some code will be executed only when you start your program. Other code will be executed over and over while your program runs.
Some code takes much longer to run than other code. For example, a single line that runs a query on a database, or pulls a file from the internet will probably take longer than millions of mathematical operations.

The 90/10 rule isn't literally true. It varies by program (and I doubt there is any basis to the specific numbers 90 and 10 at all; someone probably pulled them out of thin air). But the point is, if you need your program to run faster, probably only a small number of lines is significant to making that happen. Identifying the slow parts of your software is often the biggest part of optimisation.

This is an important insight, and it means that decisions that seem counterintuitive to a new developer can often be correct. For example:

There is lots of code that it is not worth your time to make "better", even if it is doing things in a dumb, simplistic way. Could you write a more efficient search algorithm for application XYZ? Yes, but actually a simple comparison of every value takes a trivial amount of time, even though there are thousands of values. So it's just not worth it. It can be tough for new developers to avoid unnecessary optimisation, because in their degree program so much time was spent on writing the "correct" (meaning most efficient) algorithm. But in the real world, the correct algorithm is any one that works and runs fast enough.
Changes that make your code much longer and more complex may still be a performance win. For example, in application FOO it may be worth adding hundreds of lines of new logic, just to avoid a single database call.

OTHER TIPS

This isn't a law of nature, but a rule of thumb born out by wide experience. It is also known as the 80/20 rule, and is only ever a rough approximation.

Loops, Branches and other flow control.

Each place that has an if, you will have one branch that is taken more often than the other branch. Thus more of the execution time is spent executing that part of the program, and not the other part.

Each place that has a loop that runs more than once, you have code that gets executed more than surrounding code. Thus more time is spent there.

As an example, consider:

def DoSomeWork():
    for i in range(1000000):
        DoWork(i)
    except WorkExeption:
        print("Oh No!")

Here the print("Oh No!") will only ever run a maximum of once, and often never, whereas the DoWork(i) will occur about a million times.

Loops.

I'm tempted to stop there! :-)

Consider this program

1. do_something

2. loop 10 times
3.    do_another_thing

4.    loop 5 times
5.        do_more_stuff

Line 1 is executed once whilst line 3 is executed 10 times. Looking at each line in turn

Two lines account for 83% of the execution time (assuming all lines take about the same time to run. So 40% of the program takes >80%.

With larger more real world examples this rises so only a small amount of lines accounts for much of the run-time.

The 90/10 rule (or as it's sometimes put 80/20) is a "rule of thumb"- only approximately true.

See also Pareto Principle

As you asked about the execution time only, this example might be helpful:

int main() {
    sleep(90); // approximately 10% of the program.
    // other 90% of the program:
    sleep(1);
    sleep(1);
    sleep(1);
    sleep(1);
    sleep(1);
    sleep(1);
    sleep(1);
    sleep(1);
    sleep(1);
    sleep(1);
    return 0;
}

If to be a little more serious, it means that in real-life code you almost always call a heavy function in a loop (instead of sleep(90);), while the rest 10% of time you perform some single-pass computations.

Another example is error handling in some HA service. Any highly-available service is designed to work infinite amount of time under normal conditions. It operates normally 99% of time, but occasionally, in case of an error, it runs some error handling and recovery, which may be even more logically complex than the service itself.

The 90/10 reasoning means a small part of you code will be repeated or used more than others. This is often used to suggest that you should concentrate 90% of your development/ optimization effort in this 10% of your code.

Think a normal text processor, like Microsoft Word or OpenOffice:

The preferences dialog, is not used much;
The subroutines that draw characters are used all the time.

This saying is also used in management sciences... This is a lesson to life itself... Meaning: concentrate most of your efforts where gives you more result.

Imagine a program like this:

print "H"
print "e"
print "l"
print "l"
print "o"
for i=0 to 1,000,000
    print "How long now?"
next
print "B"
print "y"
print "e"

Notice how there are 11 lines here where 3 out of the 11 are the for loop, where how much time is spent on this rather small piece of code? Quite a bit while the other 8 lines are merely printing a single character. Thus, beware that while some code may be short, that doesn't tell you how often is it executed and how much time it'll take.

In addition to looping, as mentioned by other great answers, there's also DRY principles to consider. Well written, Object Oriented code has a lot of reusable parts. Those parts that are reused, by definition, get used at least twice as often as something that is only executed once. If you have a lot of OO code, you could potentially be reusing a few classes and methods many times, and a few other pieces of code only once.

As mentioned in other answers, it is probably better to spend effort making the code that is used more often better than improving the code that is only used a single time.

That's not a rule, that's just some dude who's edited Wikipedia with a couple of numbers pulled out of thin air and called it a rule. Compare with Pareto Principle, which is more firmly established in other contexts. I'd like to see what research has been done (if any) on the accuracy of this "rule".

But basically the answer to your question is, some code gets executed much much more frequently than other code. Loops are often the reason for this. Other reasons are time-consuming calls e.g. to external resources like web services or storage media.

It's an reinterpretation of the "Pareto principle", which states "for many events, roughly 80% of the effects come from 20% of the causes.", also known as the 80/20 rule. This rule is mostly applied to economics, so it makes sense that it'd be re purposed for programming.

It's just a pattern that has been observed over a long period of time.

Here's a very nice video on patterns like this, and it also explains the Pareto Principle.

https://www.youtube.com/watch?v=fCn8zs912OE&ab_channel=Vsauce

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange