Question

I rarely turn here for help, but this is driving me crazy: I'm reading an xml file that wraps an arbitrary number of items, each with a b64-encoded file (and some accompanying metadata for it). Originally I just read the whole file into an XmlDocument, but while that was much cleaner code, I realized there's no limit on the size of the file, and XmlDocument eats a lot of memory and can run out if the file is large enough. So I rewrote the code to instead use XmlTextReader, which works great if the issue is that the program was sent an xml file with a large number of reasonably-sized attachments... but there's still a big problem, and that's where I turn to you:

If my xml reader is at a File element, that element contains a value that's enormous (say, 500MB), and I call reader.ReadElementContentAsString(), I now have a string that occupies 500MB (or possibly an OutOfMemoryException). What I would like to do in either case is just write to a log, "that file attachment was totally way too big, we're going to ignore it and move on", then move onto the next file. But it doesn't appear that the string I just tried to read is ever garbage collected, so what actually happens is the string takes up all the RAM, and every other file it tries to read after that also throws an OutOfMemoryException, even though most of the files will be quite small.

Recall: at this point, I'm reading the element's value into a local string, so I would have expected it would be eligible for garbage collection immediately (and that it would thus be garbage collected, at the latest, when the program attempts to read the next item and discovers it has no memory available). But I've tried everything, just in case: setting the string to null, calling explicit GC.Collect()... no dice, Task Manager indicates the GC only collected about 40k, of the ~500MB it just requested to store the string in, and I still get out of memory exceptions attempting to read anything else.

There doesn't seem to be any way to know the length of the value contained in an xml element using XmlTextReader without reading that element, so I imagine I'm stuck reading the string... am I missing something, or is there really no way to read a giant value from an xml file without totally destroying your program's ability to do anything further afterwards? I'm going insane with this.

I have read a bit about C#'s GC, and the LOH, but nothing I read would have indicated to me that this would happen...

Let me know if you need any further information, and thanks!

edit: I did realize that the process was running as a 32-bit process, which meant it was being starved for memory a bit more than it should've been. Fixed that, this becomes less of an issue, but it is still behavior I'd like to fix. (It takes more and/or larger files to reach the point where an OutOfMemoryException is thrown, but once it is thrown, I still can't seem to reclaim that memory in a timely fashion.)

Was it helpful?

Solution

I had a similiar Issue with a soap Service used to transfer large files as base64 string.

I used XDocument instead of XmlDocument back then, that did the trick for me.

OTHER TIPS

You may use XmlReader.ReadValueChunk method to read the contents of an element one "chunk" at a time instead of trying to read the whole content at once. This way you may for example decide at some point that the data is too large and then ignore it and log the event. StringBuilder is probably the best way to combine the collected char array chunks in one string.

If you want to release memory with GC.Collect(), you can force immediate finalizations and memory release with GC.WaitForPendingFinalizers(). This may affect performance (or even hang, see description behind the link), but you should get rid of the large objects assuming you don't have any live references to them anymore (i.e. the local variables are already out of scope or their value is set to null) and continue operations normally. You should of course use this as a last resort, when memory consumption is an issue and you really want to force getting rid of the excess memory allocations.

I have successfully used GC.Collect();GC.WaitForPendingFinalizers(); combination in a memory-sensitive environment to keep the memory footprint of an application well under 100MB, even when it reads through some really large XML files (>100MB). To improve performance I also used Process.PrivateMemorySize64 to track memory consumption and force finalizations only after certain limit was reached. Before my improvements memory consumption did sometimes rise over 1GB!

I am not positive this is the case but I think you need to dispose of the XmlTextReader. Save the xmlpath of the node after the excessively large node to a string, set your massive string to null, then dispose of the XmlTextReader and reopen it at the node after the large node. From what I understand if you set your string to null, or it goes out of scope, the GC should free that memory asap. It seems more likely to me that you're freeing the string but you continue doing operations with the XmlTextReader which is now holding onto a ton of memory.

Another idea that came to mind was to try doing this within an unsafe block and then freeing the memory explicitly, however, it doesn't look like that's possible (someone else might know but after looking around a bit it seems the unsafe block is still GC'd, it just gives you pointers). Yet another option, although imo a terrible one, would be to make a dll for parsing in C or C++ and call it from your C# project.

Try the first suggestion before doing anything crazy like the last one :)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top