Strategy to avoid running out of memory in memory intensive application

https://softwareengineering.stackexchange.com/questions/295063

10-10-2020
|

Question

My C# .NET application has to read lots of files with electrical readings. There are several calculation to be done and output files need to be generated. Due to the nature of the input it would be very inefficient to output after each input line or even after one file, so it would be better to do so after a chunk of files.

The reason for this is, that each file contains a list of readings/electrical units for one time stamp, but the output needs to be files for each unit.
So for instance the input would be 100 files each containing e.g. volt, ampere, watt, status code for one timestamp (2015_08_31_00_00_00.txt, 2015_08_31_00_00_05.txt ...).
The output should be per electrical unit and e.g. one file per day
(c://ampere/2015_08_31.txt, c://volt/2015_08_31.txt ...).

Note that this is a simplified picture of the application. In reality there are several different input and output formats, directory structures etc.

What I want to do is, keep the calculated and ready for output values in memory and output them according to different strategies. This could be e.g. until all input files are processed.

As some strategies like this one can result in too much data in memory, I would like to monitor the memory usage and decide if I need to output sooner.

I do know how to get the used memory of my program, but how do I safely get an estimation on the available memory?

As far as I understand this is not a simple task due to memory fragmentation, paging, trashing and so on. I do know in this case that I got the 32-Bit induced memory limit with the current build settings, but I would like to get a general answer which could also be applied to a 64-Bit program.

How do I estimate when enough memory is still available to write to disk, but use a good amount of RAM to optimize I/O?

Update
I have not yet implemented this feature so I cannot tell if memory problems would occur under usual circumstances. Thanks to all your comments and answers I see it isn't easily accomplishable.

My current idea is to use a fixed input limit based on the size of the input files. This still would not protect against problems if other programs creates heavy load. Each file is always just a few KB.

I will keep the SQLite idea in mind but I will have to check if I get an OK

Solution

Unfortunately, there's really no good way to answer the question "how much RAM is my program using?" or "how close am I to hitting the OutOfMemoryException wall?", for a number of reasons. It's a lot more complicated than it looks like it should be.

One thing you might want to do, though, is make sure that you're not holding on to the input data unnecessarily. For example, depending on how you're processing the data, it could be feasible to read the file, and perform the relevant calculations, one line at a time, or one small batch of lines at a time. If you hold on to aggregate results, but don't keep the input data around, that should keep the memory usage down.

If that doesn't help, because your results just keep piling up, the best way to handle that is to offload them out of memory. Put them in a relational database, for example, as fast as they come in, and then they end up on disc instead. And then at the end of the day, perform a query that will pull in the day's results (HINT: this is a lot easier if you have a timestamp field that gets set when you insert the records) and save them out to their final version in your output files.

OTHER TIPS

you can use a performance counter from the System.Diagnostics namespace.
Use the two-string constructor with a categoryName of "Memory" and a counterName of "Available Bytes". You can then use something like
long availableBytes = Convert.ToInt64(myPerformanceCounter.NextValue()) to get a reading.
If you need more info than that provides then you can play around with different categories and counters.
Important Note: the PerformanceCounter class utilises the IDisposable interface so be sure to either wrap it in a using statement or call the Dispose() method when you are done.

If you know the maximum amount of space you will need for your output, and it fits into your address space, you can use a function such as VirtualAlloc() to reserve a range of memory addresses, with or without committing pages. This won’t allocate any physical pages until you actually try to write to them. Therefore, you won’t get any out-of-memory errors just for reserving the memory region. You can also open the input files as memory-mapped files, so that the OS will only copy the physical pages from disk when you read them.

At that point, you won’t need to make any additional calls to allocate your memory: when you run out of physical memory, the OS should just page out the pages you aren’t using to virtual memory, or you can VirtualFree() the pages after they’re written.

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange