Parsing (very) large XML files with XmlSlurper

https://stackoverflow.com/questions/9977418

28-05-2021
|

Question

I am kind of new to Groovy and I am trying to read a (quite) large XML file (more than 1Gb) using XmlSlurper, which is supposed to work wonders with large files due to the fact that it doesn't build the whole DOM in memory.

Nevertheless I keep getting "OutOfMemoryError : Java heap space" which makes me think that there obviously is something that I'm doing wrong. I tried increasing the Xmx setting but I would rather solve the problem since I may have to deal with even bigger files afterwards.

Here is the line of code I used:

def posts = new XmlSlurper().parse(new File("posts.xml"))

Any hint on what's wrong ?

Thanks in advance,

Jérémie.

Solution

Groovy's XmlSlurper is a SAX parser, but loads the entire model into memory...

To avoid OOM exceptions, you probably need to either up your memory allowance (as you say, using the -Xmx setting), or you can write your own SAX parser to get just the data you require from the document

OTHER TIPS

I'm a bit late to this party, but I've been having the same issue also.

I made a proposition to the groovy-user mailing list, actually proposing to add something that looks like the XML::Twig perl module to XmlSlurper.

def xpathSlurper = new XPathXmlSlurper2();    
def c = { twig, it ->      
    println it.text().trim();
    twig.purgeCurrent();
}
xpathSlurper.setTwigRootHandler(xpath, c);
def fdata = xpathSlurper.parse(new File("test.xml"));

I've attached the sample code here: http://groovy.329449.n5.nabble.com/first-step-toward-Xml-Twig-for-Groovy-groovy-util-XPathXmlSlurper2-groovy-td4923577.html

I hope this helps!

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow