Question

I am using java odftoolkit library (simple-odf-0.6.6) for odf document manipulations. We iterate all documents in the loop :

TextDocument textdoc = TextDocument.loadDocument(odtFileName);
.
changing content of document
.
textdoc.save(anotherOdtFileName);
textdoc.close();
//then all resources/streams are correctly closed, checked that many times by my colleagues :)

As we are iterating thousands of documents, java app slowly takes all memory and then everything slows down as GC is trying to free some memory. We are not getting OutOfMemoryException.

I tried to tune JVM memory sizes and GC options (http://www.oracle.com/technetwork/java/javase/gc-tuning-6-140523.html) - application whitstand few more minutes but then is all memory consumed again.

This is sample from dump taken when application reaches of all available memory :

652.147: [Full GC 652.147: [Tenured: 454655K->454655K(454656K), 2.2387530 secs] 659263K->659216K(659264K), [Perm : 41836K->41836K(42112K)], 2.2388570 secs] [Times: user=2.25 sys=0.00, real=2.23 secs] 
654.387: [Full GC 654.387: [Tenured: 454656K->454656K(454656K), 2.2661510 secs] 659263K->659223K(659264K), [Perm : 41836K->41836K(42112K)], 2.2663190 secs] [Times: user=2.26 sys=0.00, real=2.26 secs] 
656.654: [Full GC 656.654: [Tenured: 454656K->454656K(454656K), 2.4117680 secs] 659263K->659229K(659264K), [Perm : 41836K->41836K(42112K)], 2.4118970 secs] [Times: user=2.41 sys=0.00, real=2.41 secs]

as you can see just few kB has been released and GC is very slow (over 2 seconds).

This jmap histogram shows biggest consumers :

 num     #instances         #bytes  class name
----------------------------------------------
   1:       2535190       99077856  [C
   2:       2529791       60714984  java.lang.String
   3:         21085       27956544  [B
   4:        389820       16181680  [Ljava.lang.Object;
   5:        147111       13373896  [Ljava.util.HashMap$Entry;
   6:        108426       13180496  <constMethodKlass>
   7:        518834       12452016  java.util.HashMap$Entry
   8:        108426        9547136  <methodKlass>
   9:        321713        7721112  java.util.Vector
  10:        306308        7351392  org.apache.xerces.dom.AttributeMap
  11:        144353        6928944  java.util.HashMap
  12:         10230        5879960  <constantPoolKlass>
  13:         10230        5089344  <instanceKlassKlass>
  14:        114065        4562600  org.odftoolkit.odfdom.dom.attribute.text.TextStyleNameAttribute
  15:         58248        4193856  org.odftoolkit.odfdom.incubator.doc.text.OdfTextParagraph
  16:         90041        3601640  org.odftoolkit.odfdom.pkg.OdfAlienAttribute
  17:         36437        3459000  [I
  18:         48609        3110976  java.util.zip.ZipEntry
  19:          7454        2939616  <constantPoolCacheKlass>
  20:         36491        2627352  org.odftoolkit.odfdom.incubator.doc.style.OdfStyle
  21:         36399        2620728  org.odftoolkit.odfdom.incubator.doc.text.OdfTextSpan
  22:         58397        2335880  org.odftoolkit.odfdom.dom.attribute.style.StyleNameAttribute
  23:         65517        2096544  org.apache.xerces.dom.TextImpl
  24:         24270        1747440  org.odftoolkit.odfdom.incubator.doc.text.OdfTextListLevelStyleBullet
  25:         36511        1460440  org.odftoolkit.odfdom.dom.attribute.style.StyleFamilyAttribute
  26:         24335        1362760  org.odftoolkit.odfdom.dom.element.style.StyleParagraphPropertiesElement
  27:         24320        1361920  org.odftoolkit.odfdom.dom.element.style.StyleListLevelPropertiesElement
  28:         24320        1361920  org.odftoolkit.odfdom.dom.element.style.StyleListLevelLabelAlignmentElement
  29:         10933        1316952  java.lang.Class
  30:         29175        1167000  org.odftoolkit.odfdom.dom.attribute.style.StyleParentStyleNameAttribute
  31:         19464        1089984  org.odftoolkit.odfdom.dom.element.style.StyleFontFaceElement
  32:         68003        1088048  java.lang.Integer
  33:          3531        1082640  <methodDataKlass>
  34:         26757        1070280  org.odftoolkit.odfdom.dom.attribute.fo.FoMarginLeftAttribute
  35:         26752        1070080  org.odftoolkit.odfdom.dom.attribute.fo.FoTextIndentAttribute
  36:         24330         973200  org.odftoolkit.odfdom.dom.attribute.style.StyleWritingModeAttribute
  32:         68003        1088048  java.lang.Integer
  33:          3531        1082640  <methodDataKlass>
  34:         26757        1070280  org.odftoolkit.odfdom.dom.attribute.fo.FoMarginLeftAttribute
  35:         26752        1070080  org.odftoolkit.odfdom.dom.attribute.fo.FoTextIndentAttribute
  36:         24330         973200  org.odftoolkit.odfdom.dom.attribute.style.StyleWritingModeAttribute
  37:         24320         972800  org.odftoolkit.odfdom.dom.attribute.text.TextListLevelPositionAndSpaceModeAttribute
  38:         24320         972800  org.odftoolkit.odfdom.dom.attribute.text.TextLevelAttribute
  39:         24320         972800  org.odftoolkit.odfdom.dom.attribute.text.TextListTabStopPositionAttribute
  40:         24320         972800  org.odftoolkit.odfdom.dom.attribute.text.TextLabelFollowedByAttribute
  41:         24315         972600  org.odftoolkit.odfdom.dom.attribute.fo.FoLineHeightAttribute
  42:         24270         970800  org.odftoolkit.odfdom.dom.attribute.text.TextBulletCharAttribute
  43:         17064         955584  org.odftoolkit.odfdom.dom.element.style.StyleTextPropertiesElement
  44:         38372         920928  java.util.ArrayList
  45:         12135         873720  org.odftoolkit.odfdom.dom.element.text.TextAElement

as you can see there`s a lot of odftoolkit related classes in the memory.

Is there any effective way how to deal with this problem ? Would be great to have possibility to unload odftoolkit from our app at runtime and load it again to get of rid of all objects in memory (obviously it`s linked together, GC cannot do anything useful).

We are considering also to run critical code as separate process for smaller groups of documents, but that does not solve cause of problem.

Was it helpful?

Solution

You most likely have a storage leak, either due to some problem in the library itself or because you are not using it appropriately. We would need a properly constructed minimal reproducible example to know which.

The poor performance you are seeing is classic "GC death spiral" behaviour, where the application spends more and more time running the GC and reclaiming less and less memory. It is likely that it would eventually lead to an OOME, after minutes or hours of the GC thrashing around.

The way to deal with the death spiral is to put a cap on the amount of time spent in garbage collection using the UseGCOverheadLimit JVM switch. If the GC takes more the designated proportion of time, the JVM proactively throws an OOME with the message "GC overhead limit exceeded". This is a GOOD THING ... generally speaking.

Then you try to track down the storage leak.


Tracking down Java storage leaks is covered by many resources. For starters, here is StackOverflow Q&A on the topic:

The basic idea is to use a tool (and there are a number of them) to identify objects that are leaking, and work back through the chain of object references to find out why they are reachable when the shouldn't be.

OTHER TIPS

You could write some tests to verify a few things:

  1. Is the lib itself leaking in the simplest case? use e.g. code like:

    public void loadSaveDocument(String fileInName, String fileOutName) throws Exception {
        OutputStream fileOutStream = new FileOutputStream(fileOutName);
        TextDocument textdoc = TextDocument.loadDocument(fileInName);
        textdoc.addParagraph("added text");
        textdoc.save(fileOutStream);
        textdoc.close();
    }
    
  2. If yes, a pragmatic solution would be to find a work-around using separate process as suggested earlier.

  3. If no, are there specific documents causing a leak? Try to run all documents through the code above.
  4. Or, if there is no specific documents causing the leak, maybe a specific modification of the documents you are using is causing a leak?
  5. If no, check what's different from the simple code compared to your application code.

I would also suggest using a profiler like JProfiler and take a few snapshots. See How to find memory leak in java using JProfiler? for answers on how to use it.

You could check out the odf library to figure out any pre-exising issues with the version that you happened to be using.

Also, one workaround is to fork all your document processing into another vm through System.exec call

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top