Pergunta

I'm using the JVM for a scientific application. The first step in my process is to load a lot of data into little double[] arrays (48-element arrays for each node in a large graph). Long before I get to the point where I find out if I have enough memory to load all of them, Java slows down asymptotically, and jvisualvm tells me that this is because nearly all of the CPU time is spent in garbage collection:

enter image description here

The first minute or so is fine: "used heap" (right plot) jumps up and down as it grows because some objects are temporary (I wrote this in Scala) and some objects are permanent. After that, however, the data-loading grinds to a halt because the garbage collector is apparently checking the same objects over and over (left plot). It must be expecting them to go out of scope, but I'm keeping them in scope because I want to use them for my analysis.

I know that the garbage collector puts objects in different generations, based on their likelihood of survival. The first generation contains objects that are recently created and likely to die soon; later generations are progressively more likely to be long-lived. If my objects are wrongly in the first generation, is there any way to tell the garbage collector that they ought to be in a later generation? I know that I'll be keeping them--- how can I tell the garbage collector?

Although I'd like these objects be in a more permanent generation, PermGen would be too far: they will die eventually, after tens of minutes of processing. (I want to use this in a Hadoop reducer, which might work on a different chunk of data after this one without a new JVM.)

Note: I'm using the Sun HotSpot VM:

% java -version
java version "1.6.0_45"
Java(TM) SE Runtime Environment (build 1.6.0_45-b06)
Java HotSpot(TM) 64-Bit Server VM (build 20.45-b01, mixed mode)

Correction (to a previous edit): Changing the -Xmx does change the saturation point, but apparently Java ignores the -Xmx command line argument if it is passed after the -jar argument. That is, do

java -Xmx2048 -jar MyJarFile.jar

rather than

java -jar MyJarFile.jar -Xmx2048

Because of this, I was incorrectly diagnosing the behavior with respect to maximum heap and all the answers pointing to the -Xmx flag are valid.

The saturation point I describe happens when the "heap size" (orange on right plot) reaches the chosen -Xmx limit, and the "heap size" is always about 1.6 times the "used heap" (blue on right plot) unless you explicitly set the size of the "Old" generation with -XX:NewRatio or -XX:OldSize. These also need to be before the -jar argument, and they provide a lot of control.

Foi útil?

Solução 2

I think you should check it using the VisualGC plugin of JVisualVM, so that you can see how the different generations are used. Based on the screenshots, it seems that the old generation is filled up (since the heap is not completely full, yet the GC is working hard), so the GC is having hard times freeing up memory. You should either increase the heap or tune the size of the generations with -XX:NewRatio and you can try adjusting the tenuring treshold as well to control when an object is considered "old".

Outras dicas

The GC should not be invoking its self in a spiral unless your heap is approaching a saturation condition. You need to increase your maximum heap size (-Xmx) - start with something approaching 2x your expected retention. You can also use the CMS collector, which can improve the situation with a large tenured set. You will also likely need to tune your new generation manually, as the old generation should not need to be swept on a regular basis.

You can also consider using NIO direct ByteBuffers. While they are designed for more efficient I/O operations, they can be a reasonable choice for very long lived and wide memory arrays.

Objects aren't garbage collected if they are still being referenced. So just keep a reference to objects until you want them to be garbage collected.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top