Question

My function will be called thousands of times. If i want to make it faster, will changing the local function variables to static be of any use? My logic behind this is that, because static variables are persistent between function calls, they are allocated only the first time, and thus, every subsequent call will not allocate memory for them and will become faster, because the memory allocation step is not done.

Also, if the above is true, then would using global variables instead of parameters be faster to pass information to the function every time it is called? i think space for parameters is also allocated on every function call, to allow for recursion (that's why recursion uses up more memory), but since my function is not recursive, and if my reasoning is correct, then taking off parameters will in theory make it faster.

I know these things I want to do are horrible programming habits, but please, tell me if it is wise. I am going to try it anyway but please give me your opinion.

Was it helpful?

Solution

The overhead of local variables is zero. Each time you call a function, you are already setting up the stack for the parameters, return values, etc. Adding local variables means that you're adding a slightly bigger number to the stack pointer (a number which is computed at compile time).

Also, local variables are probably faster due to cache locality.

If you are only calling your function "thousands" of times (not millions or billions), then you should be looking at your algorithm for optimization opportunities after you have run a profiler.


Re: cache locality (read more here): Frequently accessed global variables probably have temporal locality. They also may be copied to a register during function execution, but will be written back into memory (cache) after a function returns (otherwise they wouldn't be accessible to anything else; registers don't have addresses).

Local variables will generally have both temporal and spatial locality (they get that by virtue of being created on the stack). Additionally, they may be "allocated" directly to registers and never be written to memory.

OTHER TIPS

The best way to find out is to actually run a profiler. This can be as simple as executing several timed tests using both methods and then averaging out the results and comparing, or you may consider a full-blown profiling tool which attaches itself to a process and graphs out memory use over time and execution speed.

Do not perform random micro code-tuning because you have a gut feeling it will be faster. Compilers all have slightly different implementations of things and what is true on one compiler on one environment may be false on another configuration.

To tackle that comment about fewer parameters: the process of "inlining" functions essentially removes the overhead related to calling a function. Chances are a small function will be automatically in-lined by the compiler, but you can suggest a function be inlined as well.

In a different language, C++, the new standard coming out supports perfect forwarding, and perfect move semantics with rvalue references which removes the need for temporaries in certain cases which can reduce the cost of calling a function.

I suspect you're prematurely optimizing, however, you should not be this concerned with performance until you've discovered your real bottlenecks.

Absolutly not! The only "performance" difference is when variables are initialised

    int anint = 42;
 vs
    static int anint = 42;

In the first case the integer will be set to 42 every time the function is called in the second case ot will be set to 42 when the program is loaded.

However the difference is so trivial as to be barely noticable. Its a common misconception that storage has to be allocated for "automatic" variables on every call. This is not so C uses the already allocated space in the stack for these variables.

Static variables may actually slow you down as its some aggresive optimisations are not possible on static variables. Also as locals are in a contiguous area of the stack they are easier to cache efficiently.

There is no one answer to this. It will vary with the CPU, the compiler, the compiler flags, the number of local variables you have, what the CPU's been doing before you call the function, and quite possibly the phase of the moon.

Consider two extremes; if you have only one or a few local variables, it/they might easily be stored in registers rather than be allocated memory locations at all. If register "pressure" is sufficiently low that this may happen without executing any instructions at all.

At the opposite extreme there are a few machines (e.g., IBM mainframes) that don't have stacks at all. In this case, what we'd normally think of as stack frames are actually allocated as a linked list on the heap. As you'd probably guess, this can be quite slow.

When it comes to accessing the variables, the situation's somewhat similar -- access to a machine register is pretty well guaranteed to be faster than anything allocated in memory can possible hope for. OTOH, it's possible for access to variables on the stack to be pretty slow -- it normally requires something like an indexed indirect access, which (especially with older CPUs) tends to be fairly slow. OTOH, access to a global (which a static is, even though its name isn't globally visible) typically requires forming an absolute address, which some CPUs penalize to some degree as well.

Bottom line: even the advice to profile your code may be misplaced -- the difference may easily be so tiny that even a profiler won't detect it dependably, and the only way to be sure is to examine the assembly language that's produced (and spend a few years learning assembly language well enough to know say anything when you do look at it). The other side of this is that when you're dealing with a difference you can't even measure dependably, the chances that it'll have a material effect on the speed of real code is so remote that it's probably not worth the trouble.

It looks like the static vs non-static has been completely covered but on the topic of global variables. Often these will slow down a programs execution rather than speed it up.

The reason is that tightly scoped variables make it easy for the compiler to heavily optimise, if the compiler has to look all over your application for instances of where a global might be used then its optimising won't be as good.

This is compounded when you introduce pointers, say you have the following code:

int myFunction()
{
    SomeStruct *A, *B;
    FillOutSomeStruct(B);
    memcpy(A, B, sizeof(A);
    return A.result;
}

the compiler knows that the pointer A and B can never overlap and so it can optimise the copy. If A and B are global then they could possibly point to overlapping or identical memory, this means the compiler must 'play it safe' which is slower. The problem is generally called 'pointer aliasing' and can occur in lots of situations not just memory copies.

http://en.wikipedia.org/wiki/Pointer_alias

Yes, using static variables will make a function a tiny bit faster. However, this will cause problems if you ever want to make your program multi-threaded. Since static variables are shared between function invocations, invoking the function simultaneously in different threads will result in undefined behaviour. Multi-threading is the type of thing you may want to do in the future to really speed up your code.

Most of the things you mentioned are referred to as micro-optimizations. Generally, worrying about these kind of things is a bad idea. It makes your code harder to read, and harder to maintain. It's also highly likely to introduce bugs. You'll likely get more bang for your buck doing optimizations at a higher level.

As M2tM suggestions, running a profiler is also a good idea. Check out gprof for one which is quite easy to use.

You can always time your application to truly determine what is fastest. Here is what I understand: (all of this depends on the architecture of your processor, btw)

C functions create a stack frame, which is where passed parameters are put, and local variables are put, as well as the return pointer back to where the caller called the function. There is no memory management allocation here. It usually a simple pointer movement and thats it. Accessing data off the stack is also pretty quick. Penalties usually come into play when you're dealing with pointers.

As for global or static variables, they're the same...from the standpoint that they're going to be allocated in the same region of memory. Accessing these may use a different method of access than local variables, depends on the compiler.

The major difference between your scenarios is memory footprint, not so much speed.

Using static variables can actually make your code significantly slower. Static variables must exist in a 'data' region of memory. In order to use that variable, the function must execute a load instruction to read from main memory, or a store instruction to write to it. If that region is not in the cache, you lose many cycles. A local variable that lives on the stack will most surely have an address that is in the cache, and might even be in a cpu register, never appearing in memory at all.

I agree with the others comments about profiling to find out stuff like that, but generally speaking, function static variables should be slower. If you want them, what you are really after is a global. Function statics insert code/data to check if the thing has been initialized already that gets run every time your function is called.

Profiling may not see the difference, disassembling and knowing what to look for might.

I suspect you are only going to get a variation as much as a few clock cycles per loop (on average depending on the compiler, etc). Sometimes the change will be dramatic improvement or dramatically slower, and that wont necessarily be because the variables home has moved to/from the stack. Lets say you save four clock cycles per function call for 10000 calls on a 2ghz processor. Very rough calculation: 20 microseconds saved. Is 20 microseconds a lot or a little compared to your current execution time?

You will likely get more a performance improvement by making all of your char and short variables into ints, among other things. Micro-optimization is a good thing to know but takes lots of time experimenting, disassembling, timing the execution of your code, understanding that fewer instructions does not necessarily mean faster for example.

Take your specific program, disassemble both the function in question and the code that calls it. With and without the static. If you gain only one or two instructions and this is the only optimization you are going to do, it is probably not worth it. You may not be able to see the difference while profiling. Changes in where the cache lines hit could show up in profiling before changes in the code for example.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top