Stack Size Estimation

https://stackoverflow.com/questions/1756285

20-09-2019
|

Question

In multi-threaded embedded software (written in C or C++), a thread must be given enough stack space in order to allow it to complete its operations without overflowing. Correct sizing of the stack is critical in some real-time embedded environments, because (at least in some systems I've worked with), the operating system will NOT detect this for you.

Usually, the stack size for a new thread (other than the main thread) is designated at the time that thread is created (i.e. in an argument to pthread_create() or the like). Often, these stack sizes are hard-coded to values that are known to be good at the time the code was originally written or tested.

However, future changes to the code often break the assumptions on which the hard-coded stack sizes were based, and one fateful day, your thread enters one of the deeper branches of its call graph and overflows the stack -- bringing down the whole system or silently corrupting memory.

I have personally seen this problem in the case where code executed in the thread declares struct instances on the stack. When the struct is augmented to hold additional data, the stack size inflates accordingly, potentially allowing stack overflows to occur. I imagine this could be a huge problem for established codebases where the full effects of adding fields to a structure cannot be known immediately (too many threads/functions to find all the places where that struct is used).

Since the usual response to "stack sizing" questions is "they're not portable", let's assume that the compiler, operating system, and processor are all known quantities for this investigation. Let's also assume recursion isn't used, so we're not dealing with the possibility an "infinite recursion" scenario.

What are some reliable ways to estimate the necessary stack size for a thread? I'd prefer methods that are offline (static analysis) and automatic, but all ideas are welcome.

Solution

Runtime-Evaluation

An online method is to paint the complete stack with a certain value, like 0xAAAA (or 0xAA, whatever your width is). Then you can check how large the stack has maximally grown in the past by checking how much of the painting is left untouched.

Have a look at this link for an explanation with illustration.

The advantage is that it's simple. A disadvantage is that you cannot be certain that your stack size won't eventually exceed the amount of used stack during your testing.

Static Evaluation

There are some static checks and I think there even exists a hacked gcc version that tries to do this. The only thing I can tell you is that static checking is very difficult to do in the general case.

Also have a look at this question.

OTHER TIPS

You can use a static analysis tool like StackAnalyzer, if your target fits the requirements.

If you want spend significant money you can use a commercial static analysis tool like Klocwork. Although Klocwork is primarily aimed at detecting software defects and security vulnerabilities. However, it also has a tool called 'kwstackoverflow' that can be used to detect stack overflow within a task or thread. I'm using for the embedded project that I work on, and I have had positive results. I don't think any tool like this is perfect, but I believe these commericial tools are very good. Most of the tools I have come across struggle with function pointers. I also know that many compiler vendors like Green Hills now build similar functionality right into their compilers. This is probably the best solution because the compiler has intimate knowledge of all the details needed to make accurate decisions about the stack size.

If you have the time, I'm sure you can use a scripting language to make your own stack overflow analysis tool. The script would need to identify the entry point of the task or thread, generate a complete function call tree, and then calculate the amount of stack space that each function uses. I suspect there are probably free tools available that can generate a complete function call tree so that should make it easier. If you know the specifics of your platform generating the stack space each function uses can be very easy. For example, the first assembly instruction of a PowerPC function often is the store word with update instruction that adjusts the stack pointer by the amount needed for the function. You can take the size in bytes right from the first instruction which makes determining the total stack space used relatively easy.

These types of analysis will all give you an approximation of the worst case upper bound for stack usage which is exactly what you want to know. Of course, pundits (like the ones I work with) might complain that you're allocating too much stack space, but they are dinosaurs that don't care about good software quality :)

One other possibility, although it doesn't calculate stack usage would be to use the memory management unit (MMU) of your processor (if it has one) to detect stack overflow. I have done this on VxWorks 5.4 using a PowerPC. The idea is simple, just put a page of write protected memory at the very top of your stack. If you overflow, a processor execption will occur and you will quickly be alerted to the stack overflow problem. Of course, it doesn't tell you by how much you need to increase the stack size, but if your good with debugging exception/core files you can at least figure out the calling sequence that overflowed the stack. You can then use this information to increase your stack size appropriately.

-djhaus

Not free, but Coverity does static analysis of the stack.

Static (off-line) stack checking is not as difficult as it seems. I have implemented it for our embedded IDE (RapidiTTy) — it currently works for ARM7 (NXP LPC2xxx), Cortex-M3 (STM32 and NXP LPC17xx), x86 and our in-house MIPS ISA compatible FPGA soft-core.

Essentially we use a simple parse of the executable code to determine the stack usage of each function. Most significant stack allocation is done at the start of each function; just be sure to see how it alters with different optimisation levels and, if applicable, ARM/Thumb instruction sets, etc. Remember also that tasks usually have their own stacks, and ISRs often (but not always) share a separate stack area!

Once you have the usage of each function, it's fairly easy to build up a call-tree from the parse and calculate the maximum usage for every function. Our IDE generates schedulers (effective thin RTOSes) for you, so we know exactly which functions are being designated as 'tasks' and which are ISRs, so we can tell the worst-case usage for each stack area.

Of course, these figures are almost always over the actual maximum. Think of a function like sprintf that can use a lot of stack space, but varies enormously depending on the format string and parameters that you provide. For these situations you can also use dynamic analysis — fill the stack with a known value in your startup, then run in the debugger for a while, pause and see how much of each stack is still filled with your value (high watermark style testing).

Neither approach is perfect, but combining both will give you a fairly good picture of what the real-world usage is going to be like.

As discussed in the answer to this question, a common technique is to initialise the stack with a known value and then to run the code for a while and see where the pattern stops.

This is not an offline method but on the project that I am working on we have a debug command that reads the high water mark on all of the task stacks within the application. This outputs a table of the stack usage for each task and the amount of headroom available. Checking this data after a 24 hour run with lots of user interaction gives us some confidence that the defined stack allocations are "safe".

This works using the well tried technique of filling the stacks with a known pattern and assuming that the only way that this can be re-written is by the normal stack usage, although if it is being written by any other means a stack overflow is the least of your worries!

We tried to solve this problem on an embedded system at my work. It got crazy, there is just too much code (both our own and 3rd party frameworks) to get any reliable answer. Luckily, our device was Linux based so we fell back to the standard behavior of giving every thread 2mb and letting the virtual memory manager optimize the use.

Our one issue with this solution was one of the 3rd party tools performed an mlock on its entire memory space (ideally to improve performance). This caused all 2mb of stack for each thread of its threads (75-150 of them) to be paged in. We lost half of our memory space to until we figured it out and commented out the offending line.

Sidenote: Linux's virtual memory manager(vmm) allocates RAM in 4k chunks. When a new thread asks for 2MB of address space for its stack, the vmm assigns bogus memory pages to all but the top most page. When the stack grows into bogus page the kernel detects a page fault and swaps the bogus page with a real one, (which consumes another 4k of actual RAM). This way a thread's stack can grow to any size it needs (as long as it's less than 2mb) and the vmm will ensure only a minimal amount of memory is used.

Apart from some of the suggestions already done I would like to point out that often in embedded systems you have to control stack usage tightly because you have to keep the stack size at a reasonable size.

In a sense, using stack space is a bit like allocating memory but without an (easy) way to determine if your allocation succeeded so not controlling stack usage will result in a forever struggle to figure out why your system is again crashing. So, for instance, if you system allocates memory for local variables from the stack, either allocate that memory with malloc() or, if you can't use malloc() write your own memory handler (which is a simple enough task).

No-no:

void func(myMassiveStruct_t par)
{
  myMassiveStruct_t tmpVar;
}

Yes-yes:

void func (myMassiveStruct_t *par)
{
  myMassiveStruct_t *tmpVar;
  tmpVar = (myMassiveStruct_t*) malloc (sizeof(myMassicveStruct_t));
}

Seems pretty obvious but often isn't - especially when you can't use malloc().

Of course you will still have problems, so this is just something to help but doesn't solve your problem. It will, however, help you estimate the stack size in the future, since once you have found a good size for your stacks and if you then, after some code modifications, again run out of stack space you can detect a number of bugs or other problems (too deep call stacks for one).

Not 100% sure, but I think this may also be done. If you have a jtag port exposed, you can connect to Trace32 and check the maximum stack usage. Though for this, you will have to give an initial pretty big arbitrary stack size.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow