Question

In university, at our algorithms courses, we learn how to precisely compute the complexity of various simple algorithms that are used in practice, such as hash tables or quick sort.

But now in a big software project, when we want to make it faster, all we do is look at individual pieces -a a few nested loops there that can be replaces by a faster hash table, a slow search here that can be sped up by a more fancy technique- but we never compute the complexity of our whole pipeline.

Is there any way to do that? Or are people in practice just relying on "locally" using a fast algorithm, to make the whole application faster, instead of globally considering the application as whole?

(Because it seems to me to be nontrivial to show that if you pile up a large number of algorithms that are known to be very fast on their own, you also end up with a fast application as a whole.)

I am asking this, because I'm tasked with speeding up a large project someone else has written, where a lot of algorithms are interacting and working on input data, so it is unclear to me how the impact of making on single algorithm faster on the whole application.

Was it helpful?

Solution

Large software projects consist of many different components, and not all of them are typically a bottleneck. Quite the opposite: for almost any program have seen in my life where low performance was an issue, the Pareto principle applied: more than 80% of the performance gains can be achieved by optimizing less than 20% of the code (in reality, I think the numbers were often more 95% to 5%).

So starting to look at individual pieces is often the best approach. That is why profiling (as explained in David Arno's answer) is fine, since it helps you to identify the mentioned 5% of the code where optimizing will give you the "biggest bang for the buck". Optimizing "the whole application" bears a certain risk of overengineering, and if you optimize those 95% even by a factor of 10, it would often have no measurable effect. Note also that profiling tells you way more than any hypothetical algorithm complexity estimation, since a simple algorithm which needs O(N^3) steps can still be faster than a complex algorithm which requires O(N log(N)) as long as N is small enough.

After profiling revealed the hot spots, one can optimize them. Of course, a "hot spot" can be bigger than just one or two lines of code, sometimes one has to replace a whole component to make it faster, but that will be typically still a small part of the code base in a larger program.

Typical optimiziation techniques include

  • improving the use of algorithms and data structures

  • fine tuning of the former

  • micro-optimizations at some real hot spots

  • recoding critical sections using assembly code or CUDA

Note these techniques are working on different levels of abstraction, some of them viewing a component more "as a whole" than others. So it depends on what you mean by "all we do is look at individual pieces" - if you only had micro-optimizations in mind, I disagree that "we" work only on that. But if you mean applying those full-scale optimizations on isolated parts or components, than "we" are probably working on the right parts, and you should question your expectations.

OTHER TIPS

The standard, tried and tested way is to profile the code. You perform dynamic analysis of the running system to measure timings, memory usage etc. Then analyse the results to find performance bottlenecks.

Those bottlenecks are then experimentally rewritten and the result profiled once more to determine that a speed increase, reduction in memory use etc has been achieved. This process is then repeated until an acceptable performance gain is achieved.

Although the other answers are correct and provide some guidance, I do think they miss a step. In a complex system like you are working with now, understanding the different components that make up the system is key to understanding why something is slow.

My first step would be to get my hands on a detailed architecture diagram or create one myself. Figure out what steps are taken by what components in the software and how long each step takes.

Also, figure out how the components interact with each other. This can make all the difference.

For example, I've seen code in C# where the interface between two components was passing an IEnumerable built by the first component, which was then enumerated by the second component. In C# this requires context switching, which can be costly under certain circumstances. Solving it has no impact on the algorithm. A simple .ToList() make sure the result is collected before the next step solves this issue.

Another thing to consider is the impact on the system you are running the code on. Hardware interactions can obviously be a factor in complex systems. Look for Disk IO, Large Memory allocations, and Network IO. Sometimes these can be solved more efficiently by tweaking the system or even replacing hardware.

Licensed under: CC-BY-SA with attribution
scroll top