Question

First of all, before anyone complains, I realize that within the perspective of theoretically perfect C++ code, the memory model is an implementation detail that I should not rely on. However, I'm favouring performance over discipline.

Here's the scenario: I have a region of address space which I've told the OS to back with a file of my choice - that is, the file is memory mapped. If my understanding of how VMMs typically work is correct, the OS may be quite lazy about the loading of pages into my mapping and may only do this when the page actually gets touched.

Normally I could ignore this detail, but in this particular case, I am sending the mapped data into a worker thread pool. If I just naively pass the worker a pointer to this buffer, there's a good chance the worker thread itself will be the one to hit a page fault when the page is first touched, and this will cause the worker to block until the page is physically loaded in by the VMM.

The design of the worker pool is such that it's very bad to have its threads blocking on I/O, whereas the thread sending in the job can tolerate being blocked. Therefore, I want to have my sender thread touch the mapping's page(s) first so that the page faults will block it instead.

(I understand that there's no guarantee that touching the page first will stop a subsequent page fault in the worker thread, but still the program will be optimal most of the time and correct all of the time.)

In x86 assembly language this would be trivial:

; get the page's address in ebx
mov al, Byte Ptr [ebx]

Unfortunately, it's not so simple in C or C++. A naive implementation would be simple:

char *pPage = ...;
char Dummy = *pPage;

But, this probably won't work because any self-respecting optimizer will realize that the code does nothing and simply omit it.

We could use inline assembly, but this may badly cripple the optimizer. We could call an assembly language function to do it but then we have (admittedly small) function call overhead needlessly.

We could instead make Dummy an externally-visible variable, which would work, because the compiler couldn't assume the assignment to be meaningless. However, this could seriously degrade performance in multi-core systems by causing contention over the CPU cache line holding Dummy. (Not to mention, we waste that cache line and access.)

I also thought of doing this:

char volatile *pPage = ...;
char Dummy = *pPage;

I know the volatile keyword makes two guarantees:

  • The compiler won't reorder accesses; and

  • The compiler won't assume the value will be identical between successive reads.

However, this does not seem to guarantee that the compiler will read the value even if it doesn't need it.

Any ideas?

Was it helpful?

Solution

volatile is guaranteed to perform a memory access by definition, so a simple solution would be exactly what you suggested:

volatile char *prefetch_me = ...;
(void)*prefetch_me;

However if you want to touch multiple pages in a (potentially) more efficient manner (and you're running on a *ix system) then look at madvise(), specifically the MADV_WILLNEED and/or MADV_SEQUENTIAL. From the man page:

  • MADV_WILLNEED - Expect access in the near future. (Hence, it might be a good idea to read some pages ahead.)
  • MADV_SEQUENTIAL - Expect page references in sequential order. (Hence, pages in the given range can be aggressively read ahead, and may be freed soon after they are accessed.)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top