Question

This question and its answer, which was recently tagged as an Epic Answer, has prompted me to wonder; Can I measure the performance of a running application in Windows in terms of its CPU branch prediction failures? I know that some static analysis tools exist, that might help with optimizing code for good performance in branch-prediction situations, and that manual techniques could help by simply making changes and re-testing, but I'm looking for some automatic mechanism that can report a total number of branch prediction failures, over a period of time, as a Windows application runs, and I'm hoping that some Profiler tool for Visual C++ could help me.

For the sake of this question, the application in question is either built with a native-compiler such as Visual C++ for Windows, or using some other native compiler, such as GCC, FreePascal, Delphi, or TurboAssembler. The executable may not have any debug information at all. I want to know if I can detect, and count branch prediction failures, perhaps by reading internal CPU information through some Windows service like WMI, or perhaps by running entirely inside a virtualized environment running Windows, such as using VirtualBox, and then running a completely virtualized windows environment with my test application, inside VirtualBox, and doing runtime analysis of the virtual CPU. Or some other technique that I don't know of, thus this question.

Yes, I googled. The only thing that looks promising is this PDF from AMD. Page 18 mentions something very close to what I'd like to do, but seems written for those working without any operating system, on raw evaluation hardware platforms:

5.1. Branches. Applicability. Conditional branch mispredictions may be a significant issue in code with a lot of decision-making logic.

Conditional branches may be mispredicted when the likelihood of choosing the true or false path is random or near a 50-50 split. The branch prediction hardware cannot "learn" a pattern and branches are not predicted correctly. Collection. Collect the events in this table to measure branch prediction performance:

Branches Compute the rate at which branches are taken and the ratio of the number of instructions per branch using these formulas: Branch taken rate = Taken_branches / Ret_instructions Branch taken ratio = Taken_branches / Branches
Instructions per branch = Ret_instructions / Branches

Update: I guess I could say I'm looking for a way to read the Intel Core i7 PMU module, or equivalent functions of other CPUs. It looks like Intel VTUNE (from the comments by Adrian) is very close to what I asked for.

Was it helpful?

Solution

VTune Performance Analyzer can do it! Btw if you are studying these topics, take a look at "Optimization Cookbook" from Intel Press.

Note: Comments state the same answer but with some uncertainty, I used VTune and I measured the branch prediction rate for an Intel CPU. So I'm 100% sure.

here is the link for VTune

here is the link for the book

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top