stack prefaulting in linux - single or multiple faults needed

Question 1

The automatic growing of the stack can be thought of as automatic calls to mremap to resize the virtual address region that counts as "stack". Once that's handled, page faults to the stack area or to a vanilla mmap region are handled the same, i.e., one page at a time.

Thus you should end up with ~2 pages allocated, not ~51. @perreal's empirical answer validates this ...

To the last part of the question, the cost of contiguous page faults is one of the factors that lead to the development of "huge pages". I don't think there are other ways in Linux to "batch" page fault handling. Maybe madvise might do something but I suspect its mostly optimizing the really expensive part of page faults which is looking up the backing pages on storage). Stack page faults which map to zero pages are relatively lightweight by comparison.

Question 2

With this code:

int main() {
  int *a = alloca(100); /* some useful data */
  int *b = alloca(50*4096); /* skip 49 pages */
  int *c = alloca(100);
  int i;
#if TOUCH > 0
  a[0] = 1;               // [1]
#endif
#if TOUCH > 1
  c[0] = 1;               // [2]
#endif
#if TOUCH > 2
  for (i=0; i<25; i++)    // [3]
    b[i*1024] = 1;
#endif
#if TOUCH > 3
  for (i=25; i<50; i++)   // [4]
    b[i*1024] = 1;
#endif
  return 0;
}

And this script:

for i in 1 2 3 4; do
  gcc d.c -DTOUCH=$i
  echo "Upto [$i]" $(perf stat ./a.out 2>&1 | grep page-faults)
done

The output:

Upto [1] 105 page-faults # 0.410 M/sec
Upto [2] 106 page-faults # 0.246 M/sec
Upto [3] 130 page-faults # 0.279 M/sec
Upto [4] 154 page-faults # 0.290 M/sec