Question

TL;DR

How much memory does opening a file take up on a modern Windows system? Some application loads will need to open "a lot" of files. Windows is very capable of opening "a lot" of files, but what is the load of keeping a single file open, so that one can decide when "a lot" is "too much"?

Background

For sequential processing of large-ish datasets (100s MB ~ few GB) inside a 32bit process, we need to come up with a buffer that stores its contents on disk instead of in memory.

We have fleshed out a little class without too much problem (using CreateFile with FILE_ATTRIBUTE_TEMPORARY and FILE_FLAG_DELETE_ON_CLOSE).

The problem is, the way these buffers will be used is such that each buffer (each temporary file) can potentially store from a few bytes up to a few GB of data, and we would like to keep the buffer class itself as minimal and as general as possible.

The use case ranges from 100 buffers with ~ 100MB each to 100.000s of buffers with just a few bytes each. (And yes, it is important that each buffer in this sense has it's own file.)

It would seem natural to include a buffer threshold in the buffer class that only starts creating and using a temporary on-disk file when it is actually storing more bytes than the (memory) overhead of creating+referencing a temporary file uses - in process as well as load on physical machine memory.

Question

How much memory, in bytes, does opening a (temporary) file take up on a modern Windows system?

  • Using CreateFile with FILE_ATTRIBUTE_TEMPORARY and FILE_FLAG_DELETE_ON_CLOSE
  • Bytes of the virtual address space of the (32 bit) process opening the file
  • Bytes of the physical memory on the machine (including any kernel datastructures)

That is, what is the threshold, in bytes, when you start seeing a net main memory gain (both in-process as well as physically) from storing data in a file instead of in-memory?

Notes:

The comment mentioned open file limit is not applicable to CreateFile, only to the MS CRT file API. (Opening 10.00s of files via CreateFile is no problem at all on my system -- whether it's a good idea is an entirely different matter and not part of this question.

Memory mapped files: Are totally unsuitable to process GB of data in a 32 bit process because you cannot reliably map such large datasets in to the normal 2GB address range of a 32 bit process. Are totally useless for my problem and do not, in any way at all, relate to the actual question. Plain files are just fine for the background problem.

Looked at http://blogs.technet.com/b/markrussinovich/archive/2009/09/29/3283844.aspx - which tells me that a HANDLE itself takes up 16 bytes on a 64 bit system, but that's just the handle.

Looked at STXXL and it's docs, but neither is this lib appropriate for my task nor did I find any mention of a useful threshold before starting to actually use files.


Useful comments summary:

Raymond writes: "The answer will vary depending on what antivirus software is installed, so the only way to know is to test it on the production configuration."

qwm writes: "I would care more about cpu overhead. Anyway, the best way to answer your question is to test it. All I can say is that size of _FILE_OBJECT alone (including _OBJECT_HEADER) is ~300b, and some of its fields are pointers to other related structures."

Damon writes: "One correct answer is: 10 bytes (on my Windows 7 machine). Since nobody else seemed it worthwhile to actually try, I did (measured difference in MEMORYSTATUSEX::ullAvailVirtual over 100k calls, nothing else running). Don't ask me why it isn't 8 or 16 bytes, I wouldn't know. Took around 17 seconds of kernel time, process had 100,030 handles open upon exiting. Private working set goes up by 412k during run whereas global available VM goes down by 1M, so roughly 60% of the memory overhead is inside the kernel. (...)"

"What's more stunning is the huge amount of kernel time (which is busy CPU time, not something like waiting on disk!) that CreateFile obviously consumes. 17 seconds for 100k calls boils down to around 450,000 cycles for opening one handle on this machine. Compared to that, the mere 10 bytes of virtual memory going away are kind of negligible."

Was it helpful?

Solution

I now did some measurements:

  • I set up a RAM disk with 2G to not mess up my normal NTFS file table.
  • I created 1M files (1,000,000) in a loop and checked various system performance measures via perfmon.

The call to create a temporary file (and I keep it's handle until the end) looks like this:

HANDLE CreateNewTempFile(LPCTSTR filePath) {
    return ::CreateFile(
        filePath, 
        GENERIC_READ | GENERIC_WRITE, // reading and writing
        FILE_SHARE_READ, // Note: FILE_FLAG_DELETE_ON_CLOSE will also block readers, unless they specify FILE_SHARE_DELETE 
        /*Security:*/NULL, 
        CREATE_NEW, // only create if does not exist
        FILE_ATTRIBUTE_TEMPORARY | // optimize access for temporary file
        FILE_FLAG_DELETE_ON_CLOSE, // delete once the last handle has been closed
        NULL);
}

The results are:

  • After all temp files have been deleted again, the RAM disk usage is as follows:
    • Total 2060 M Bytes
    • Used 1063 M Bytes
    • Free 997 M Bytes
  • Comparing start and end values (with a few samples in between) I conclude the following average memory consumption per open (temp) file:
    • Memory/Available Bytes - approx. 4k bytes decrease per open file (lot's of jitter on this counter: obviously, as this test ran for 10 minutes
    • Memory/Pool Paged Bytes - approx. 3k bytes per open file
    • Memory/Pool Nonpages Bytes - approx. 2,2k bytes per open file
  • What's also interesting is that the process memory load was not really increasing in any significant way (as tracked by Process/Working Set).

Note that I also tracked paging and the page file was not utilized at all (as I would hope for since this machine has 16GB of RAM and at the lowest point I still had ~ 4GB free).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top