Do you use Big-O complexity evaluation in the 'real world'?

https://stackoverflow.com/questions/1248509

12-09-2019
|

Question

Recently in an interview I was asked several questions related to the Big-O of various algorithms that came up in the course of the technical questions. I don't think I did very well on this... In the ten years since I took programming courses where we were asked to calculate the Big-O of algorithms I have not have one discussion about the 'Big-O' of anything I have worked on or designed. I have been involved in many discussions with other team members and with the architects I have worked with about the complexity and speed of code, but I have never been part of a team that actually used Big-O calculations on a real project. The discussions are always "is there a better or more efficient way to do this given our understanding of out data?" Never "what is the complexity of this algorithm"?

I was wondering if people actually have discussions about the "Big-O" of their code in the real word?

Solution

It's not so much using it, it's more that you understand the implications.

There are programmers who do not realise the consequence of using an O(N^2) sorting algorithm.

I doubt many apart from those working in academia would use Big-O Complexity Analysis in anger day-to-day.

OTHER TIPS

No needless n-squared

In my experience you don't have many discussions about it, because it doesn't need discussing. In practice, in my experience, all that ever happens is you discover something is slow and see that it's O(n^2) when in fact it could be O(n log n) or O(n), and then you go and change it. There's no discussion other than "that's n-squared, go fix it".

So yes, in my experience you do use it pretty commonly, but only in the basest sense of "decrease the order of the polynomial", and not in some highly tuned analysis of "yes, but if we switch to this crazy algorithm, we'll increase from logN down to the inverse of Ackerman's function" or some such nonsense. Anything less than a polynomial, and the theory goes out the window and you switch to profiling (e.g. even to decide between O(n) and O(n log n), measure real data).

Big-O notation is rather theoretical, while in practice, you are more interested in actual profiling results which give you a hard number as to how your performance is.

You might have two sorting algorithms which by the book have O(n^2) and O(nlogn) upper bounds, but profiling results might show that the more efficient one might have some overhead (which is not reflected in the theoretical bound you found for it) and for the specific problem set you are dealing with, you might choose the theoretically-less-efficient sorting algorithm.

Bottom line: in real life, profiling results usually take precedence over theoretical runtime bounds.

I do, all the time. When you have to deal with "large" numbers, typically in my case: users, rows in database, promotion codes, etc., you have to know and take into account the Big-O of your algorithms.

For example, an algorithm that generates random promotion codes for distribution could be used to generate billions of codes... Using a O(N^2) algorithm to generate unique codes means weeks of CPU time, whereas a O(N) means hours.

Another typical example is queries in code (bad!). People look up a table then perform a query for each row... this brings up the order to N^2. You can usually change the code to use SQL properly and get orders of N or NlogN.

So, in my experience, profiling is useful ONLY AFTER the correct class of algorithms is used. I use profiling to catch bad behaviours like understanding why a "small" number bound application under-performs.

The answer from my personal experience is - No. Probably the reason is that I use only simple, well understood algorithms and data structures. Their complexity analysis is already done and published, decades ago. Why we should avoid fancy algorithms is better explained by Rob Pike here. In short, a practitioner almost never have to invent new algorithms and as a consequence almost never have to use Big-O.

Well that doesn't mean that you should not be proficient in Big-O. A project might demand the design and analysis of an altogether new algorithm. For some real-world examples, please read the "war stories" in Skiena's The Algorithm Design Manual.

To the extent that I know that three nested for-loops are probably worse than one nested for-loop. In other words, I use it as a reference gut feeling.

I have never calculated an algorithm's Big-O outside of academia. If I have two ways to approach a certain problem, if my gut feeling says that one will have a lower Big-O than the other one, I'll probably instinctively take the smaller one, without further analysis.

On the other hand, if I know for certain the size of n that comes into my algorithm, and I know for certain it to be relatively small (say, under 100 elements), I might take the most legible one (I like to know what my code does even one month after it has been written). After all, the difference between 100^2 and 100^3 executions is hardly noticeable by the user with today's computers (until proven otherwise).

But, as others have pointed out, the profiler has the last and definite word: If the code I write executes slowly, I trust the profiler more than any theoretical rule, and fix accordingly.

I try to hold off optimizations until profiling data proves they are needed. Unless, of course, it is blatently obvious at design time that one algorithm will be more efficient than the other options (without adding too much complexity to the project).

Yes, I use it. And no, it's not often "discussed", just like we don't often discuss whether "orderCount" or "xyz" is a better variable name.

Usually, you don't sit down and analyze it, but you develop a gut feeling based on what you know, and can pretty much estimate the O-complexity on the fly in most cases.

I typically give it a moment's thought when I have to perform a lot of list operations. Am I doing any needless O(n^2) complexity stuff, that could have been done in linear time? How many passes am I making over the list? It's not something you need to make a formal analysis of, but without knowledge of big-O notation, it becomes a lot harder to do accurately.

If you want your software to perform acceptably on larger input sizes, then you need to consider the big-O complexity of your algorithms, formally or informally. Profiling is great for telling you how the program performs now, but if you're using a O(2^n) algorithm, your profiler will tell you that everything is just fine as long as your input size is tiny. And then your input size grows, and runtime explodes.

People often dismiss big-O notation as "theoretical", or "useless", or "less important than profiling". Which just indicates that they don't understand what big-O complexity is for. It solves a different problem than a profiler does. Both are essential in writing software with good performance. But profiling is ultimately a reactive tool. It tells you where your problem is, once the problem exists.

Big-O complexity proactively tells you which parts of your code are going to blow up if you run it on larger inputs. A profiler can not tell you that.

No. I don't use Big-O complexity in 'real world' situations.

My view on the whole issue is this - (maybe wrong.. but its just my take.)

The Big-O complexity stuff is ultimately to understand how efficient an algorithm is. If from experience or by other means, you understand the algorithms you are dealing with, and are able to use the right algo in the right place, thats all that matters.

If you know this Big-O stuff and are able to use it properly, well and good.

If you don't know to talk about algos and their efficiencies in the mathematical way - Big-O stuff, but you know what really matters - the best algo to use in a situation - thats perfectly fine.

If you don't know either, its bad.

Although you rarely need to do deep big-o analysis of a piece of code, it's important to know what it means and to be able to quickly evaluate the complexity of the code you're writing and the consequences it might have.

At development time you often feel like it's "good enough". Eh, no-one will ever put more than 100 elements in this array right ? Then, one day, someone will put 1000 elements in the array (trust users on that: if the code allows it, one of them will do it). And that n^2 algorithm that was good enough now is a big performance problem.

It's sometimes usefull the other way around: if you know that you functionaly have to make n^2 operations and the complexity of your algorithm happens to be n^3, there might be something you can do about it to make it n^2. Once it's n^2, you'll have to work on smaller optimizations.

In the contrary, if you just wrote a sorting algorithm and find out it has a linear complexity, you can be sure that there's a problem with it. (Of course, in real life, occasions were you have to write your own sorting algorithm are rare, but I once saw someone in an interview who was plainly satisfied with his one single for loop sorting algorithm).

Yes, for server-side code, one bottle-neck can mean you can't scale, because you get diminishing returns no matter how much hardware you throw at a problem.

That being said, there are often other reasons for scalability problems, such as blocking on file- and network-access, which are much slower than any internal computation you'll see, which is why profiling is more important than BigO.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow