Question

I am profiling an application using sampling in VS 2012 (although the profiler doesn't matter that much). I have a good lead on where the performance bottleneck lies, however, I'm hampered by the fact that there's a lot of memory allocations going on, and the garbage collector seems to be significantly skewing my profiling (I can somewhat see the GC effects in CLR Profiler and Concurrency Visualizer).

Is there a way to somehow get rid of the samples acquired while the GC is running? I could use any of these:

  • Ignore samples collected while a GC is running (filter by function pointer?)
  • Separate time spent GCing and the time spent actually working
  • Increase GC limits to effectively "turn it off" for profiling
  • Actually turn off the GC

The problem is that I pretty much don't know what I need to optimize. Attempts at optimizing the GC by reducing allocations etc. had very low real impact on release versions without a debugger attached, so I really want to know how much of the profiling results are due to disabled optimizations etc., and how much is code that could be improved (the code in question is used by a large portion of our projects, so even a 10% increase in performance would have huge impact).

Was it helpful?

Solution

I would suggest you back off and try a different approach. Here's how it works:

There is a speed bug in your program. (It is very unlikely there is not.) It you find it and fix it, you will save some fraction of time. Suppose it is 50%.

That means if you just run it under the IDE, and pause it manually while you're waiting for it, there is a 50% chance you will have stopped it in the time you would save. Find out what it's doing and why it's doing it, by looking at the call stack, each line of code on the call stack, and maybe the data.

Do this a small number of times, like 5, 10, or 20, depending on what you see. You will see it performing the speed bug on about 50% of those samples, guaranteed.

This will tell you some things that the profiler will not, such as:

  • If the speed bug is that you are performing lots of news that you could possibly avoid by re-using objects, it will show you the exact line(s) where that is happening, and the reason why. The sampling profiler can give you line-level inclusive time, but it cannot tell you the reason for the time being spent, and without knowing the reason you can't be sure you don't need it. OTOH, if the sample lands in GC, ignore it and look for new, because new is expensive too and it is what causes GC.

  • If the speed bug is that you are actually doing some file I/O or network access or sleeps deep inside some library routine you didn't know about, it will tell you that and why, and you can figure out a way around it. The sampling profiler will not tell you this because it is a "CPU profiler", meaning it sleeps whenever your program is blocked. If you switch to the instrumented profiler, it will not give you line-level precision. Neither way will it tell you the reason why the time is being spent.

You could have to endure some derision if you try this, but it will get you the results you want. What's more, if you find and fix that 50% speed bug, the program will be 2x faster. That has the effect of making further speed bugs easier to find. For example, if there was initially a 25% speed bug in addition to the 50% one, now it is a 50% one, and if you find and fix it you will be 4x faster. It can surprise you that you can keep going this way until you can't any more, and then you will be close to optimal.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top