How to avoid running out of memory in high memory usage application? C / C++

https://stackoverflow.com/questions/741834

09-09-2019
|

Question

I have written a converter that takes openstreetmap xml files and converts them to a binary runtime rendering format that is typically about 10% of the original size. Input file sizes are typically 3gb and larger. The input files are not loaded into memory all at once, but streamed as points and polys are collected, then a bsp is run on them and the file is output. Recently on larger files it runs out of memory and dies (the one in question has 14million points and 1million polygons). Typically my program is using about 1gb to 1.2 gb of ram when this happens. I've tried increasing virtual memory from 2 to 8gb (on XP) but this change made no effect. Also, since this code is open-source I would like to have it work regardless of the available ram (albeit slower), it runs on Windows, Linux and Mac.

What techniques can I use to avoid having it run out of memory? Processing the data in smaller sub-sets and then merging the final results? Using my own virtual memory type of handler? Any other ideas?

Solution

First, on a 32-bit system, you will always be limited to 4 GB of memory, no matter pagefile settings. (And of those, only 2GB will be available to your process on Windows. On Linux, you'll typically have around 3GB available)

So the first obvious solution is to switch to a 64-bit OS, and compile your application for 64-bit. That gives you a huge virtual memory space to use, and the OS will swap data in and out of the pagefile as necessary to keep things working.

Second, allocating smaller chunks of memory at a time may help. It's often easier to find 4 256MB chunks of free memory than one 1GB chunk.

Third, split up the problem. Don't process the entire dataset at once, but try to load and process only a small section at a time.

OTHER TIPS

Have you checked to ensure you aren't leaking memory anywhere?

Since your program is portable to Linux, I suggest running it under Valgrind to make sure.

It sounds like you're already doing a SAX based approach to the XML processing (loading the XML as you go instead of all at once).

The solution is almost always to change the algorithm so that it cuts the problem into smaller parts. Physically don't allocate as much memory at one time, read in only what you need, process it, then write it out.

You can sometimes extend memory via using the hard drive instead when needed in your algorithm.

If you can't split up your algorithm, you probably want something like memory mapped files.

In the worst case you can try to use something like VirtualAlloc if you are on a windows system. If you are on a 32-bit system you can try to use something like Physical Address Extension (PAE).

You could also consider putting input limitations for your program, and having a different one for 32-bit and 64-bit systems.

I suspect your memory issues are from keeping the BSP tree in memory. So keep the BSP on disk and only keep some chunks in memory. This should be fairly easy with BSP, as the structure lends itself more than some other tree structures, and the logic should be simple. To be both efficient and memory friendly you could have a cache w/ dirty flag, with the cache size set to available memory less a bit for breathing room.

Assuming you are using Windows XP, if you are only just over your memory limit and do not desire or have the time to rework the code as suggested above, you can add the /3GB switch to your boot.ini file and then it just a matter of setting a linker switch to get an extra 1GB of memory.

You have to understand that Virtual Memory is different from "RAM" in that the amount of Virtual Memory you're using is the total amount you've reserved, while real memory (in Windows its called Working Set) is memory that you've actually modified or locked.

As someone else pointed out, on 32-bit Windows Platforms the limit on Virtual Memory is 2 gigabytes unless you set the special flag for 3 gigabytes and can ensure that all the pointers both in your code and any libraries you use only use unsigned pointers.

So either forcing users to 64-bit or monitoring your Virtual Memory and capping your max block size to something that comfortably fits inside the limits imposed by 32-bit operating systems would be my advice.

I've slammed into the 32-bit wall in Windows, but have no experience with working around these limitations in Linux so I've only talked about the Windows side of things.

On 32-bit XP your maximum program address space is 2GB. Then you have fragmentation due to DLL's and drivers loading up in to your address space. Finally, you have the problem of your heap fragmenting.

Your best move is just to get it over with and run as a 64-bit process (on a 64-bit system). Suddenly all these problems go away. You can use a better heap to mitigate heap fragmentation effects, and you can try using VirtualAlloc to grab your memory in one big contiguous chunk (and then you get to manage it from there!) to discourage DLL's/drivers from fragmenting it.

Finally, you can split your BSP across processes. Complicated and painful, and frankly just putting it on disk would be easier, but in theory you could get better performance by having a group of processes exchanging information, if you can keep everything resident (and assuming you can be smarter than memory than the OS can handle file buffering... which is a big if). Each process would need far less memory and therefore shouldn't run in to the 2GB address space limit. Of course, you'll burn through RAM/swap a lot faster.

You can mitigate the effects of fragmentation of the address space by allocating smaller chunks. This will have other nasty side effects, but you could follow a backoff policy where you grab smaller and smaller chunks of memory if you fail to successfully allocate. Frequently this simple approach will get you a program that works when it otherwise wouldn't, but the rest of the time performs as well as it could.

Boy, doesn't 64-bit computing just sound so much nicer than the other choices?

How are you allocating memory for points ? Are you allocating point one at a time (e.g. pt = new Point ). Then depending on the size of point, some memory may get wasted. For example on windows memory is allocated in the multiples of 16 bytes, so even if you ask try to allocate 1 byte, OS will actually allocate 16 bytes.

If this is the case, using a memory allocator may help. You can do a quick check using STL allocator. (over load the new operator for the Point class and use the STL allocator to allocate memory rather than 'malloc' or default new operator).

You may not be allocating and deallocating memory in an optimum manner. As others have pointed out, you may be leaking memory and not knowing it. Debugging and optimizing memory allocation will take time.

If you don't want to spend time optimizing memory usage, why not try the Conservative Garbage Collector? It's a plug-in replacement for malloc()/new and free(). In fact, free() is a no-op, so you can just remove those calls from your program. If, instead, you hand-optimize your program and manage a pool of memory as previously suggested, you'll end up doing a lot of the work that the CGC already does for you.

You need to stream your output as well as your input. If your output format is not stream-oriented, consider doing second pass. For example, if the output file starts with check sum/size of the data, leave space on the first pass and seek/write to that space later.

It sound like you are doing txt to binary conversation so why do you need to have the entire data in the memory?.
Can't you just read a primitive from txt (xml) then save to binarystream?

If you want to be memory-size independent, you need a size-independent algorithm. No matter what size your RAM is, if you don't have memory usage under control, you're going to bump into the border.

Take a look at the least chunk of information you can possibly use to produce a bit of output. Then think of a way to divide the input into chunks of this size.

Now that sounds easy, doesn't it? (Glad I don't have to do it :) )

You don't need to switch to 64-bit machines, nor you need most of the 1000 things suggested by others. What you need is a more thoughtful algorithm.

Here are some things you can do to help out with this situation:

If you're on Windows, utilize File Maps (sample code). This will give access to the file via a single buffer pointer as though you read the whole file in memory, only without actually doing that. Recent versions of Linux Kernel have a similar mechanism.
If you can, and it looks like you could, scan the file sequentially and avoid creating an in-memory DOM. This will greatly decrease your load-time as well as memory requirements.
Use Pooled Memory! You will probably have many tiny objects, such as nodes, points and whatnot. Use a pooled memory to help out (I'm assuming you're using an unmanaged language. Search for Pooled allocation and memory pools).
If you're using a managed language, at least move this particular part into an unmanaged language and take control of the memory and file reading. Managed languages have a non-trivial overhead both in memory footprint and performance. (Yes, I know this is tagged "C++"...)
Attempt to design an in-place algorithm, where you read and process only the minimum amount of data at a time, so your memory requirements would go down.

Finally, let me point out that complex tasks require complex measures. If you think you can afford a 64-bit machine with 8GB of RAM, then just use "read file into memory, process data, write output" algorithm, even if it takes a day to finish.

there's a good technique for that, is to store some instances into files, and after getting them when you need to use them.

this technique is used by many open source software like Doxygen to be scalable when a big quantity of memory is needed.

This is an old question but, since I've recently done the same thing ....

There is no simple answer. In an ideal world you'd use a machine with huge address space (ie 64 bit), and massive amounts of physical memory. Huge address space alone is not sufficient or it'll just thrash. In that case parse the XML file into a database, and with appropriate queries, pull out what you need. Quite likely this is what OSM itself does (I believe the world is about 330GB).

In reality I'm still using XP 32bit for reasons of expediency.

It's a trade off between space and speed. You can do pretty much anything in any amount of memory providing you don't care how long it takes. Using STL structures you can parse anything you want, but you'll soon run out of memory. You can define your own allocators that swap, but again, it'll be inefficient because the maps, vectors, sets etc do not really know what you are doing.

The only way I found to make it all work in a small footprint on a 32 bit machine was to think very carefully about what I was doing and what was needed when and break the task into chunks. Memory efficient (never uses more than ~100MB) but not massively quick, but then it doesn't matter - how often does one have to parse the XML data?

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow