Xperfview: What's the difference between CPU sampling and CPU Usage?

https://stackoverflow.com/questions/3567093

01-10-2019
|

Question

This question pertains to xperf and xperfview, utilities that are part of the Windows Performance Toolkit (in turn part of Windows SDK 7.1).

Comparing two charts, "CPU sampling by thread" and "CPU Usage by thread", there are several differences I don't understand. I'll use audiodg.exe as an example.

In the Threads pulldown, there is only one thread for audiodg on the CPU Sampling chart; the CPU Usage chart shows several audiodg threads.
Both graphs have a Y-axis marked "% Usage", but the measurements differ. Typically the % usage for a given thread is lower on the CPU Sampling chart than on the CPU Usage chart.
The CPU Sampling summary table shows Weight and % weight for each module/process. If I load symbols, I can dig pretty deep into the audiodg process. The CPU Scheduling Aggregate Summary table (launched from the CPU Usage graph) shows CPU Usage and % CPU usage -- Weight is not available. (Conversely, CPU Usage is not available on the CPU Sampling summary table.) I cannot dig as deep into audiodg -- I only see the main thread and a few ntdll.dll threads.
The numbers for any process in the % CPU usage and % Weight columns are always different. Sometimes they differ by more than 75%.

So my questions ... what is the reliable measure of CPU usage here? Aren't the CPU Usage numbers derived from CPU Samples? Shouldn't the numbers relate somehow?

Solution

Xperf does make this a bit confusing, this is my understanding of what's going on:

CPU sample data, enabled with the PROFILE kernel flag. CPU sample data is collected at some regular interval, and records information about what the CPU was doing at that time (e.g. the process, thread Id, and callstack at the time of the sample.)
Context switch data, enabled with the CSWITCH kernel flag. This records data about every context switch that happens (e.g. who was switched in/out and the callstacks.)

CPU sampling by thread shows the number of profile events that were recorded for each thread, aggregated over some interval of time for the duration of the trace. For example, if audiodg was executing 10% of the time for 2 seconds, we would expect to see about 10 "% usage" over that time. However, because this is based on sampling, it's possible that at each sample event, threads from another process happened to be executing--in other words, the 10% was 'missed' by the sample events.

CPU Usage by thread is calculated using the context switch data. The 'usage' is the amount of time between being context switched in and then out later (and of course, this data is aggregated over some small interval).

There are benefits to each data:

CPU sampling will actually tell you what the thread is doing at the time of the sample because it collects call stacks during the execution of the thread. The context switch information will only tell you when a thread gets switched in or out, but nothing between.
Context switch information will tell you exactly how much time every thread got to execute. This data is correct. Sampling, of course, is only probabilistic.

So to answer your question, the CPU Usage chart is "more accurate" for understanding how much time each thread was executing. However, don't rule out the use of the sampling data because it can be much more helpful for understanding where your threads were actually spending their time! For the CPU sampling data, the summary table is more valuable because it will show you the stacks. For the CPU usage data, the chart is probably more helpful than the summary table.

Hope that helps!

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow