Which (OS X) dtrace probe fires when a page is faulted in from disk?

https://stackoverflow.com/questions/3349185

01-10-2020
|

Question

I'm writing up a document about page faulting and am trying to get some concrete numbers to work with, so I wrote up a simple program that reads 12*1024*1024 bytes of data. Easy:

int main()
{
  FILE*in = fopen("data.bin", "rb");
  int i;
  int total=0;
  for(i=0; i<1024*1024*12; i++)
    total += fgetc(in);
  printf("%d\n", total);
}

So yes, it goes through and reads the entire file. The issue is that I need the dtrace probe that is going to fire 1536 times during this process (12M/8k). Even if I count all of the fbt:mach_kernel:vm_fault*: probes and all of the vminfo::: probes, I don't hit 500, so I know I'm not finding the right probes.

Anyone know where I can find the dtrace probes that fire when a page is faulted in from disk?

UPDATE:

On the off chance that the issue was that there was some intelligent pre-fetching going on in the stdio functions, I tried the following:

int main()
{
  int in = open("data.bin", O_RDONLY | O_NONBLOCK);
  int i;
  int total=0;
  char buf[128];
  for(i=0; i<1024*1024*12; i++)
  {
    read(in, buf, 1);
    total += buf[0];
  }
  printf("%d\n", total);
}

This version takes MUCH longer to run (42s real time, 10s of which was user and the rest was system time - page faults, I'm guessing) but still generates one fifth as many faults as I would expect.

For the curious, the time increase is not due to loop overhead and casting (char to int.) The code version that does just these actions takes .07 seconds.

Solution

Not a direct answer, but it seems you are equating disk reads and page faults. They are not necessarily the same. In your code you are reading data from a file into a small user memory chunk, so the I/O system can read the file into the buffer/VM cache in any way and size it sees fit. I might be wrong here, I don't know how Darwin does this.

I think the more reliable test would be to mmap(2) the whole file into process memory and then go touch each page is that space.

OTHER TIPS

I was down the same rathole recently. I don't have my DTrace scripts or test programs available just now, but I will give you the following advice:

1.) Get your hands on OS X Internals by Amit Singh and read section 8.3 on virtual memory (this will get you in the right frame of reference for selecting DTrace probes).

2.) Get your hands on Solaris Performance and Tools by Brendan Gregg / Jim Mauro. Read the section on virtual memory and pay close attention to the example DTrace scripts that make use of the vminfo provider.

3.) OS X is definitely prefetching large chunks of pages from the filesystem, and your test program is playing right into this optimization (since you're reading sequentially). Interestingly, this is not the case for Solaris. Try randomly accessing the big array to defeat the prefetch.

The assumption that the operating system will fault in each and every page that's being touched as a separate operation (and that therefore, if you touch N pages, you'll see the DTrace probe fire N times) is flawed; most UN*Xes will perform some sort of readahead or pre-faulting and you're very unlikely to get exactly the same number of calls to as you have pages. This is so even if you use mmap() directly.

The exact ratio may also depend on the filesystem, as readahead and page clustering implementations and thresholds are unlikely to be the same for all of them.

You probably can force a per-page fault policy if you use mmap directly and then apply madvise(MADV_DONTNEED) or similar and/or purge the entire range with msync(MS_INVALIDATE).

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow