Question

Anyone know this compiler feature? It seems GCC support that. How does it work? What is the potential gain? In which case it's good? Inner loops?

(this question is specific, not about optimization in general, thanks)

Was it helpful?

Solution

It works by placing extra code to count the number of times each codepath is taken. When you compile a second time the compiler uses the knowledge gained about execution of your program that it could only guess at before. There are a couple things PGO can work toward:

  • Deciding which functions should be inlined or not depending on how often they are called.
  • Deciding how to place hints about which branch of an "if" statement should be predicted on based on the percentage of calls going one way or the other.
  • Deciding how to optimize loops based on how many iterations get taken each time that loop is called.

You never really know how much these things can help until you test it.

OTHER TIPS

PGO gives about a 5% speed boost when compiling x264, the project I work on, and we have a built-in system for it (make fprofiled). Its a nice free speed boost in some cases, and probably helps more in applications that, unlike x264, are less made up of handwritten assembly.

Jason's advise is right on. The best speedups you are going to get come from "discovering" that you let an O(n2) algorithm slip into an inner loop somewhere, or that you can cache certain computations outside of expensive functions.

Compared to the micro-optimizations that PGO can trigger, these are the big winners. Once you've done that level of optimization PGO might be able to help. We never had much luck with it though - the cost of the instrumentation was such that our application become unusably slow (by several orders of magnitude).

I like using Intel VTune as a profiler primarily because it is non-invasive compared to instrumenting profilers which change behaviour too much.

The fun thing about optimization is that speed gains are found in the unlikeliest of places.

It's also the reason you need a profiler, rather than guessing where the speed problems are.

I recommend starting with a profiler (gperf if you're using GCC) and just start poking around the results of running your application through some normal operations.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top