c++ heuristic for estimating function inlining benefits

https://stackoverflow.com/questions/7107643

27-12-2020
|

Question

In c++, what is a good heuristic for estimating the compute time benefits of inlining a function, particularly when the function is called very frequently and accounts for >= 10% of the program's execution time (eg. the evaluation function of a brute force or stochastic optimization process). Even though inlining may be ultimately beyond my control, I am still curious.

Solution

There is no general answer. It depends on the hardware, the number and type of its arguments, and what is done in the function. And how often it is called, and where. On a Sparc, for example, arguments (and the return value) are passed in registers, and each function gets 16 new registers: if the function is complex enough, those new registers may avoid spilling that would occur if the function were inlined, and the non-inline version may end up faster than the inlined one. On an Intel, which is register poor, and passes arguments in registers, just the opposite might be true, for the same function in the same program. More generally, inlining may increase program size, reducing locality. Or for very simple functions, it may reduce program size; but that again depends on the architecture. The only possible way to know is to try both, measuring the time. And even then you'll only know for that particular program, on that particular hardware.

OTHER TIPS

A function call and return on some architectures take as few as one instruction each (although they're generally not RISC-like single-cycle instructions.) In general, you can compare that to the number of cycles represented by the body of the function. A simple property access might be only a single instruction, and so putting it into a non-inlined function will triple the number of instructions to execute it -- obviously a great candidate for inlining. On the other hand, a function that formats a string for printing might represent hundreds of instructions, so two more isn't going to make any difference at all.

If your bottleneck is in a recursive function, and assuming that the level of recursion is not minimal (i.e. average recursion is not just a few levels), you are better off in working with the algorithm in the function rather than with inlining.

Try, if possible, to transform the recursion into a loop or tail-recursion (that can be implicitly transformed into a loop by the compiler), or try to determine where in the function the cost is being spent. Try to minimize the impact of the internal operations (maybe you are dynamically allocating memory that could have auto storage duration, or maybe you can factor a common operation to be performed external to the function in a wrapper and passed in as an extra argument,...)

*EDIT after the comment that recursion was not intended, but rather iteration*

If the compiler has access to the definition of the function, it will make the right decision for you in most cases. If it does not have access to the definition, just move the code around so that it does see it. Maybe make the function a static function to provide an extra hint that it won't be used anywhere else, or even mark it as inline (knowing that this will not force inlining), but avoid using special attributes that will force inlining, as the compiler probably does it better than any simple heuristic that can be produced without looking at the code.

All inlining saves you is the entry/exit cost of the function, so it's only worth considering if the function does almost nothing. Certainly if the function itself contains a function call, it's probably not worth considering.

Even if the function does very little, it has to be called so much that it owns the program counter a significant percent of the time, before any speedup of the function would be noticeable.

The behaviour here is somewhat compiler dependant. With a recursive function obviously inlining behaviour can in theory be infinite. The 'inline' keyword is only a hint to the compiler, it can choose it ignore if it can't do anything with it. Some compilers will inline the recursive function to a certain depth.

As for the 'how much will this speed things up' - unfortunately we can't provide any sort of answer to that question as 'it depends' - how much work is the function doing vs the overhead of the function call mechanism itself. Why don't you set up a test and see?

Our experience, 20+ years of writing computationally intensive C++, is that inlining is no silver bullet. You really do need to profile your code to see whether inlining will increase performance. For us except for low level 2D and 3D point and vector manipulations inlining is a waste of time. You are far better off working out a better algorithm than trying to micromanage clock ticks.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow