Thinking of memory fragmentation while you code: Premature Optimization or not?

https://stackoverflow.com/questions/10629556

09-06-2021
|

Question

I'm working on a large server application written using C++. This server needs to run possibly for months without restarting. Fragmentation is already a suspected issue here, since our memory consumption increases over time. So far the measurement has been to compare private bytes with virtual bytes, and analyze the difference in those two numbers.

My general approach to fragmentation is to leave it to analysis. I have the same way of thinking about other things like general performance and memory optimizations. You have to back up the changes with analysis and proof.

I'm noticing a lot during code reviews or discussions, that memory fragmentation is one of the first things that comes up. It's almost like there's a huge fear of it now, and there's a big initiative to "prevent fragmentation" ahead of time. Code changes are requested that seem favorable to reducing or preventing memory fragmentation problems. I tend to disagree with these right off the bat since they seem like premature optimization to me. I would be sacrificing code cleanliness/readability/maintainability/etc. in order to satisfy these changes.

For example, take the following code:

std::stringstream s;
s << "This" << "Is" << "a" << "string";

Above, the number of allocations the stringstream makes here is undefined, it could be 4 allocations, or just 1 allocation. So we can't optimize based on that alone, but the general consensus is to either use a fixed buffer or somehow modify the code to potentially use less allocations. I don't really see the stringstream expanding itself here as a huge contributor to memory problems, but maybe I'm wrong.

General improvement suggestions to code above are along the lines of:

std::stringstream s;
s << "This is a string"; // Combine it all to 1 line, supposedly less allocations?

There is also a huge push to use the stack over heap where ever possible.

Is it possible to be preemptive about memory fragmentation in this way, or is this simply a false sense of security?

Solution

It's not premature optimization if you know in advance that you need to be low-fragmentation and you have measured in advance that fragmentation is an actual problem for you and you know in advance which segments of your code are relevant. Performance is a requirement, but blind optimization is bad in any situation.

However, the superior approach is to use a fragmentation-free custom allocator, like object pool or memory arena, which guarantees no fragmentation. For example, in a physics engine, you can use a memory arena for all per-tick allocations and empty it at the end, which is not only ludicrously fast (even faster than _alloca on VS2010) but also extremely memory efficient and low fragmentation.

OTHER TIPS

It is absolutely reasonable to consider memory fragmentation at the algorithmic level. It is also reasonable to allocate small, fixed-sized objects on the stack to avoid the cost of an unnecessary heap allocation and free. However, I would definitely draw the line at anything that makes the code harder to debug, analyze, or maintain.

I would also be concerned that there's a lot of suggestions that are just plain wrong. Probably 1/2 of the things people typically say should be done "to avoid memory fragmentation" likely have no effect whatsoever and a sizable fraction of the rest are likely harmful.

For most realistic, long-running server type applications on typical modern computing hardware, fragmentation of user-space virtual memory just won't be an issue with simple, straight-forwarded coding.

I think it is more than a best practice than a premature optimization. If you have a test suite you can create a set of memory tests to run and measure memory, performance, etc, in the night period for example. You can read the reports and fix some errors if possible.

The problem with small optimizations is change the code for something different but with the same business logic. Like using a reverse for loop because it is faster than regular for. your unit test probably guide you to optimize some points without side effects.

To much concern about memory fragmentation before you actually encounter it is clearly premature optimization; I wouldn't take it too much into consideration in the initial design. Things like good encapsulation are more important (since they will allow you to change the memory representation later, if you need to).

On the other hand, it is good design to avoid unnecessary allocation, and to use local variables instead of dynamic allocation when possible. Not just for reasons of fragmentation, but also for reasons of program simplicity. C++ tends to prefer value semantics in general, and programs using value semantics (copy and assignment) are more natural than those using reference semantics (dynamic allocation and passing pointers around).

I think you should not be solving the fragmentation problem before you actually encounter it, but at the same time your software should be designed to allow for an easy integration of such a solution for memory fragmentation problem. And since the solution is custom memory allocator it means that plugging in one into your code (operator new/delete and Allocator classes for your containers) should be done by changing one line of code somewhere in your config.h file, and absolutely not by going through all instantiations of all containers and such. Another point to support this is that 99% of all current complex software is multi-threaded and memory allocation from different threads leads to synchronization problems and sometimes false-sharing. And the answer to these problems is again custom memory allocator.

So if your design supports custom allocator, then you should not accept code changes that are sold to you as "fragmentation freeing" ones, not until you profile your app and see for yourself that the patch really does decrease the number of DTLB or LLC misses through packing the data better. If, however the design does not allow for the custom allocator, then this should be implemented as a first step before doing any other "memory fragmentation eliminating" code changes.

From what I remember about the internal design, Threading Building Blocks scalable allocator could be tried to both - increase memory allocation scalability and decrease memory fragmentation.

Another small point: the example you're making with stringstream allocation(s) and the policy to pack allocations together as much as possible - my understanding is that in some cases this will lead to memory fragmentation instead of solving this problem. Packing all allocations together will get you to request large contiguous chunks of memory, which might end up being scattered and then other similar large chunk requests will not be able to fill in the gaps.

One more point I would like to mention is : Why dont you try some sort of garbage collector. you can invoke it after a certain threshold or after a certain time-period. garbage collector will automatically collect the unused memory after a certain threshold.

Also regarding fragmentation, try to allocate some type of storage for different types of objects and manage them yourslef in your code.

ie if you have say 5 types of objects (of class A, B, C, D and E). you can allocate space in the beginning for say 1000 objects of each type in the begining in say cacheA, cacheB...cacheE.

So, you will avoid many calls of malloc and new as well as fragmentation will be very less. Also code will be readable as before, since you just need to implement something like myAlloc , which will allocate from your cacheA, cacheB etc...

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow