Question

When I use YJP to do cpu-tracing profile on our own product, it is really slow.

The product runs in a 16 core machine with 8GB heap, and I use grinder to run a small load test (e.g. 10 grinder threads) which have about 7~10 steps during the profiling. I have a script to start the product with profiler, start profiling (using controller api) and then start grinder to emulate user operations. When all the operations finish, the script tells the profiler to stop profiling and save snapshot.

During the profiling, for each step in the grinder test, it takes more than 1 million ms to finish. The whole profiling often takes more than 10 hours with just 10 grinder threads, and each runs the test 10 times. Without profiler, it finishes within 500 ms.

So... besides the problems with the product to be profiled, is there anything else that affects the performance of the cpu tracing process itself?

Was it helpful?

Solution

Last I used YourKit (v7.5.11, which is pretty old, current version is 12) it had two CPU profiling settings: sampling and tracing, the latter being much faster and less accurate. Since tracing is supposed to be more accurate I used it myself and also observed huge slowdown, in spite of the statement that the slowdown were "average". Yet it was far less than your results: from 2 seconds to 10 minutes. My code is a fragment of a calculation engine, virtually no IO, no waits on whatever, just reading a input, calculating and output the result into the console - so the whole slowdown comes from the profiler, no external influences.

Back to your question: the option mentioned - samping vs tracing, will affect the performance, so you may try sampling.

Now that I think of it: YourKit can be setup such that it does things automatically, like making snapshots periodically or on low memory, profiling memory usage, object allocations, each of this measures will make profiling slowlier. Perhaps you should make an online session instead of script controlled, to see what it really does.

OTHER TIPS

According to some Yourkit Doc:

Although tracing provides more information, it has its drawbacks. First, it may noticeably slow down the profiled application, because the profiler executes special code on each enter to and exit from the methods being profiled. The greater the number of method invocations in the profiled application, the lower its speed when tracing is turned on.

The second drawback is that, since this mode affects the execution speed of the profiled application, the CPU times recorded in this mode may be less adequate than times recorded with sampling. Please use this mode only if you really need method invocation counts.

Also:

When sampling is used, the profiler periodically queries stacks of running threads to estimate the slowest parts of the code. No method invocation counts are available, only CPU time.

Sampling is typically the best option when your goal is to locate and discover performance bottlenecks. With sampling, the profiler adds virtually no overhead to the profiled application.

Also, it's a little confusing what the doc means by "CPU time", because it also talks about "wall-clock time". If you are doing any I/O, waits, sleeps, or any other kind of blocking, it is important to get samples on wall-clock time, not CPU-only time, because it's dangerous to assume that blocked time is either insignificant or unavoidable. Fortunately, that appears to be the default (though it's still a little unclear):

The default configuration for CPU sampling is to measure wall time for I/O methods and CPU time for all other methods.

"Use Preconfigured Settings..." allows to choose this and other presents. (sic)

If your goal is to make the code as fast as possible, don't be concerned with invocation counts and measurement "accuracy"; do find out which lines of code are on the stack a large fraction of the time, and why. More on all that.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top