Identifying Major Page Fault cause

https://stackoverflow.com/questions/4267006

27-09-2019
|

Question

I've been asked to look at an internal application written in C++ and running on Linux thats having some difficulties. Periodically it will have a large amount of major page faults (~200k), which cause the wall clock run time to increase by x10+, then on some runs it will have none.

I've tried isolating different pieces of the code but am struggling to repeat the page fault errors when testing it.

Does anyone have any suggestions for getting any more information out of the application/Linux on major page faults? All I have really is a total.

Solution

You may like to consider Valgrid, described on the home page as:

Valgrind is an instrumentation framework for building dynamic analysis tools. There are Valgrind tools that can automatically detect many memory management and threading bugs, and profile your programs in detail. You can also use Valgrind to build new tools.

Specifically Valgrind contains a tool called Massif, for which the following (paraphrased) overview is given in the manual:

Massif is a heap profiler. It measures how much heap memory your program uses. [..]

Heap profiling can help you reduce the amount of memory your program uses. On modern machines with virtual memory, this provides the following benefits:

It can speed up your program -- a smaller program will interact better with your machine's caches and avoid paging.

If your program uses lots of memory, it will reduce the chance that it exhausts your machine's swap space.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow