What do programmers “micro-optimize” for today? [closed]

https://softwareengineering.stackexchange.com/questions/21977

programming-practices

22-10-2019
|

Question

Back in the "good ol' days," when we would copy shareware onto floppies for friends, we also used a fair bit of assembly. There was a common practice of "micro-optimization," where you would stare and stare at lines of assembly until you figured out a way to express it in one fewer instruction. There was even a saying, which was mathematically impossible, that "You can always remove one more instruction." Given that changing runtime performance by small constant factors isn't a major issue for (most) programming today, are programmers transferring these micro-optimization efforts elsewhere?

In other words, Can a best-practice be taken to an extreme state where it's no longer adding anything of value? And instead is wasting time?

For example: Do programmers waste time generalizing private methods that are only called from one place? Is time wasted reducing test case data? Are programmers (still) overly concerned about reducing lines of code?

There are two great examples of what I'm looking for below: (1) Spending time finding the right variable names, even renaming everything; and (2) Removing even minor and tenuous code duplication.

Note that this is different from the question "What do you optimize for?", because I'm asking what other programmers seem to maximize, with the stigma of these being "micro" optimizations, and thus not a productive use of time.

Solution

Code Formatting

Don't     get    me   wrong             ,
code      should be   consistent        & 
readable                                , 
but       some   take it         too far.

OTHER TIPS

I used to write a lot of assembler back in the day. It is not just that compilers have gotten better, it is that most hardware now has lots of logic devoted to out-of-order execution of code. The real micro-issue is scheduling, most computer instructions take several machine clocks to produce a result -and a memory load that misses cache may take several hundred! So the idea was to schedule other instructions to do something useful, instead of waiting for a result. And modern machines can issue several instructions per clock period. Once we started getting out-of-order execution HW, I found that trying to get great performance with hand coding became a mugs game. First the out-of-order HW would not execute the instructions in your carefully crafted order, the fancy new HW architecture has reduced the penalty of inoptimal software scheduling enough that the compiler was usually within a few percent of your performance. Also I found out that compilers were now implementing wellknown but complexity generating tricks, such as unrolling, bottom loading, software pipelining etc. The bottom line, you have to work really really hard, skip some of these tricks and the compiler beats you. Use them all and the number of assembler instructions you need increase several fold!

Probably even more important, most performance issues, are not about instruction issue rates, but getting the data into the CPU. As I mentioned above, memory latency is now hundreds of cycles, and the CPU can execute several instructions per clock period, so unless the program -and especially the data structures are designed so that the cache hit rate is exceedingly high, microtuning at the instruction level will have no payoff. Just like military types say amateurs talk tactics, pros talk logistics. Performance programming is now more than 90% logistics (moving data). And this is hard to quantify, as modern memory management typically has multiple levels of cache, and virtual memory pages are handled by a hardware unit called the TLB. Also lowlevel alignment of addresses becomes important, as actual data transfers, are not in units of bytes, or even 64bit long-longs, but they come in units of cache lines. Then most modern machines have hardware that tries to predict which cache line misses you might need in the near future and issue automatic prefetches to get them into the cache. So the reality is that with modern CPUs performance models are so complex as to be almost un-understandable. Even detailed hardware simulators can never match the exact logic of the chips, so exact tuning is simply impossible anymore.

There is still a place for some hand coding. Math libraries (like say the exp function), as are the more important linear algebra operations (like matrix multiply) are still usually hand coded by experts that work for the hardware vendor (i.e. Intel or AMD, or IBM), but they probably only need a couple of top notch assembler programmers per mega-computer corp.

I sometimes spend (waste?) time while choosing a good name for a variable or a method so that it is not only precisely descriptive but also has a good linguistic style.

It goes a little further when I attempt to put the entire works (a program) into the same linguistic style. Sometimes my opinion changes and I revise the book. :)

No that it takes that much time. It's rather rare. But I like to keep the good grammar in my programs.

Time complexity. I spend way too much time optimizing things for worst case performance on a scale that is orders of magnitude larger than anything I know the program will realistically encounter.

I'm just too obsessive to let go of the 'but it could grow that big' bone, even when other design decisions preclude that from realistically happening.

However, in my defense, should the unrealistic become reality .. the optimization really isn't 'micro' any longer. Note, I didn't say impossible, but unrealistic.

Of course, requirements take precedence.

I think I've wasted weeks worth of time fiddling with Java exception handlers.

This is a lot of work for code that fails two or three times per year. Should this be a warning or an info? Error or fatal? Is it really fatal is the process will be re-spawned in five minutes? Is it really a warning if everything is still in default state?

Lots of navel gazing and discussion about the nature of an IO error. If you can't read a file over the network, is it a file error or a network error? And so on.

At one point I replaced all the concat strings with String.format so the yearly file error doesn't result in extra objects in the heap. That was a waste.

I suppose I am concerned with LOCs too much. Not so much LOCs themselves, but rather number of statements and even more number of duplicate statements. I am allergic to duplicate code. In general I love refactoring, but I suppose about 50% of what I do doesn't make the code signifficantly nicer.

Reading a file line by line instead of just reading the whole line into a string and process the string in one take.

Sure, it makes a difference in execution speed but is rarely worth the extra lines of code. It makes the code far less maintainable and increases code size.

Yes, this probably doesn't make sense if the file is 3GB big but most of the files aren't that big (at least not the ones I'm working with ;-)).

When I was writing assembly language it made sense to nitpick over bytes and cycles, but that was a long time ago and compilers have come a long way since then. I wouldn't try to manually out-optimize one now.

Similarly, when I switched to writing in C it was pretty easy to do a better job than the C compilers, but every year Borland or Microsoft or someone would release a new 'n improved one that kicked the previous one around the room. I started paying attention to the actual assembly code that the compiler was emitting and, danged if it wasn't writing some nice and tight code, unrolling loops, moving variables outside the loops, etc.

Now days I'm writing in a lot higher languages like Perl, Python and Ruby. I use some of the compiler tricks up front, like loop unrolling if it makes sense, moving static variables outside the loops, etc., but I don't worry about it nearly as much because CPUs are a weeeeee bit faster now. If an app seems to be dragging unexpectedly then I'll use a profiler and see what I can find, and then if something can be improved I'll start benchmarking various ways I can think of to do something faster, and then see how it scales up.

In general I try to be smart on how I write code, based on years of experience.

A micro-optimization: improving single-threaded performance of things that are parallelizable or that can simply be improved with new hardware.

If something is slow on a computer from 10 years ago, a cheaper solution is probably to buy a newer, faster computer, rather than to waste programmer time optimizing. A big focus of optimizations considered, though, should be finding a way to use the 8 cores you can get today, or the 64 you'll be able to get in a couple years, instead of agonizing about minutia like the extra cost of garbage collection.

Micro-optimization in terms of runtime performance is hardly a closed problem even today (although it might be less common). I occasionally have to explain to people that the overhead of allocating an extra object every request or adding an extra function call is negligible.

Aside from that, I also see programmers micro-optimize for simplicity (including myself). I have yet to work somewhere that I didn't have to explain the difference between simplicity and simplism.

I'm all for the don't-optimise-unless-really-necessary principle. And I'm all for the profile-first principle. And in principle, I will only optimise something that will make a needed difference.

That's in principle. But principle is a liar.

I have a habit of taking care with functions in frequently used libraries that I think will be used a lot in inner loops. And then, when you spot something in another function, that isn't so often used...

Oh - and then there's the "I'm writing it - of course it's an important library" logic.

Oh - and BTW. I have never yet used a profiler to guide an optimisation. Don't have access to a commercial one, and never quite built up the motivation to figure out gprof.

Basically, I'm a hypocrite. These principles apply to other people, but I have too much fear of someone saying "but why did you write it that slow and crappy way?" so I can never really apply them properly to myself.

I'm not really as bad as I make out here, though. And I'm completely immune to self-deception, so you know I'm right about that!

EDIT

I should add - one of my major stupid hang-ups is call overheads. Not everywhere, of course, but in those I-think-it'll-be-used-a-lot-in-inner-loops cases (where I have no objective evidence of a problem). I have written nasty offsetof-based code to avoid making virtual method calls more than once. The result may even be slower, since it probably isn't a style of coding that optimisers are designed to deal with well - but it's cleverer ;-)

In the old days when computers had clock times measured in microseconds, memory measured in kilobytes, and primitive compilers, micro-optimization made some sense.

The speed and size of current generation computers and the quality of current generation compilers mean that micro-optimization is usually a total waste of time. The exceptions tend to be when you absolutely have to get maximum performance out of some computationally intensive code ... or you are writing for an embedded system with extremely limited resources.

but I'm looking for what programmers do to waste time, not to optimize runtime performance.

Twitter and Facebook spring to mind :-)

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange