Java heap bottleneck - how to identify the cause?

https://stackoverflow.com/questions/3334232

29-09-2020
|

Pergunta

I have a J2EE project running on JBoss, with a maximum heap size of 2048m, which is giving strange results under load testing. I've benchmarked the heap and cpu usage and received the following results (series 1 is heap usage, series 2 is cpu usage):

It seems as if the heap is being used properly and getting garbage collected properly around A. When it gets to B however, there appears to be some kind of a bottleneck as there is heap space available, but it never breaks that imaginary line. At the same time, at C, the cpu usage drops dramatically. During this period we also receive an "OutOfMemoryError (GC overhead limit exceeded)," which does not make much sense to me as there is heap space available.

My guess is that there is some kind of bottleneck, but what exactly I can't even imagine. How would you suggest going about finding the cause of the issue? I've profiled the memory usage and noticed that there are quite a few instances of the one class (around a million), but the total size of these instances is fairly small (around 50MB if I remember correctly).

Edit: The server is dedicated to to this application and the CPU usage given is only for the JVM (there should not be any significant CPU usage outside of the JVM). The memory usage is only for the heap, it does not include the permgen space. This problem is reproducible. My main concern is surrounding the limit encountered around B, for which I have not found a plausible explanation yet.

Conclusion: Turns out this was caused by a bunch of long running SQL queries being called concurrently. The returned ResultSets were also very large, possibly explaining the OOME. I still have no reasonable explanation for why there appears to be some limit at B.

Solução

From the error message it appears that the JVM is using the parallel scavenger algorithm for garbage collection. The message is dumped along with an OOME error when a lot of time is spent on GC, but not a lot of the heap is recovered.

The document from Sun does not specify if the 98% of the total time consumed is to be read as 98% of the CPU utilization of the process or that of the CPU itself. In either case, I have to draw the following inferences (with limited information):

The garbage collector or the JVM process does not have enough CPU utilization, most likely due to other processes consuming CPU at the same time.
The garbage collector does not have enough CPU utilization since it is a low priority thread, and another memory intensive (but not CPU intensive) thread in the JVM is doing work at the same time, which results in the failure to de-allocate memory.

Based on the above inferences (all, one or none of them could be true), it would be worthwhile to correlate the graph that you're obtained with the runtime behavior of the application as far as users are concerned. In other words, you might find it useful to determine if other processes are kicked off (when your problem occurs), or the part of the application that is in operation (again, when the problem occurs).

In any case, the page referenced above, does give an option to disable the GC overhead limit used by the GC algorithm.

EDIT: If the problem occurs periodically, and can be reproduced, it might turn out to be a memory leak, otherwise (i.e. it occurs sporadically), you are better off tuning the GC algorithm or even changing it.

Outras dicas

If I want to know where the "bottlenecks" are, I just get a few stackshots. There's no need to wonder and guess and play detective. They will just tell you.

Usually memory problems and performance problems go hand in hand, so if you fix the performance problems, you will also fix the memory problems (not for certain, though).

Licenciado em: CC-BY-SA com atribuição

Não afiliado a StackOverflow