Using memory mapped files for in program temporary arrays?

Question 1

Maybe a bit late, but it's an interesting question.

However, on the other hand, by using a memory mapped file, it will always be data written to the hdd with all my memory write operations. Especially for data which is smaller then my physical RAM, this is supposed to be a fairly huge bottleneck. Or does it avoid writing until it has to because the RAM is exceeded???

To avoid writing to disk while there's enough memory, you should open the file as 'temporary' (FILE_ATTRIBUTE_TEMPORARY) with FILE_FLAG_DELETE_ON_CLOSE. This will hint the OS to delay writing to disk as long as possible.

As for limitations on array size: it's probably best to provide your own datastructures and access to the mapped views. For big datasets you may want to use several different (smaller) mapped views, which you can map and unmap as needed.

Question 2

As nobody answered, I will update the status of the question myself.

After I luckily came in contact with the boost interprocess library today, I found managed_mapped_file which even allow me to allocate vectors in the mapped range which makes them nearly as easy to use as programming without mapped files at all.

Additionally, I found that:

If several processes map the same file, and a process modifies a memory range from a mapped region that is also mapped by other process, the changes are inmedially visible to other processes. However, the file contents on disk are not updated immediately, since that would hurt performance (writing to disk is several times slower than writing to memory). If the user wants to make sure that file's contents have been updated, it can flush a range from the view to disk.

http://www.boost.org/doc/libs/1_54_0/doc/html/interprocess/sharedmemorybetweenprocesses.html

So hopefully, it starts writing only once I exceed the system physical RAM. I haven't done any speed measurements yet and won't probably do some of them.

I can live with this solution now quite well. However, I will leave this question as unanswered and opened. At some point, somebody might find the question and can give any more hints such as how to prevent flushing of the data up to the point that it is actually necesssary or has some other ideas/tips how to handle the out of core data.