How do I print a groovy Node with namespace preserved?

https://stackoverflow.com/questions/227447

03-07-2019
|

Question

When I use this code to output some XML I parsed (and modified) with XmlParser

XmlParser parser = new XmlParser()
def root = parser.parseText(feedUrl.toURL().text)
def writer = new StringWriter()
new XmlNodePrinter(new PrintWriter(writer)).print(root)
println writer.toString()

the namespace declarations on the root node are not printed, even though they are there in the toString() of root... any ideas?

Solution

It looks like it's denormalizing the output and including the namespace context along with the nodes that actually need the namespace context.

For example, the webpage for this question comes in with creativeCommons namespace embedded:

<feed xmlns="http://www.w3.org/2005/Atom" xmlns:creativeCommons="http://backend.userland.com/creativeCommonsRssModule" xmlns:thr="http://purl.org/syndication/thread/1.0">
  <!-- snip -->
  <creativeCommons:license>http://www.creativecommons.org/licenses/by-nc/2.5/rdf</creativeCommons:license>
  <!-- snip -->
</feed>

When you output the xml using this script:

def root = new XmlParser().parseText("http://stackoverflow.com/feeds/question/227447".toURL().text)
println new XmlNodePrinter().print(root)

It ends up moving the namespace to the license node that needs that namespace. Not a huge deal in this case as there is only a single node in that namespace. If most of the XML were namespaced, it'd probably bloat things quite a bit more.

<feed xmlns="http://www.w3.org/2005/Atom">
  <!-- snip -->
    <creativeCommons:license xmlns:creativeCommons="http://backend.userland.com/creativeCommonsRssModule">
http://www.creativecommons.org/licenses/by-nc/2.5/rdf
  </creativeCommons:license>
  <!-- snip -->
</feed>

If you actually wanted the nodes normalized, you'd have to make some tweaks to the XmlNodePrinter to do 2 passes through the XML, first to gather all of the used namespaces and 2nd to output them at the top rather than within each namespaced node. The groovy source code is actually pretty readable and wouldn't be that hard to modify if you actually needed this.

OTHER TIPS

I've just had the same problem and after a bit of fiddling I've found a workaround.

You use the XmlSluper instead of the XmlParser and use StreamingMarkupBuilder instead of XmlNodePrinter. Then you take advantage of the closure in bind and use the mkp built-in variable to declare the namespaces.

For example; using the source xml example of Ted's from above:

def root = new XmlSlurper().parseText("http://stackoverflow.com/feeds/question/227447".toURL().text))
def outputBuilder = new StreamingMarkupBuilder()
String result = XmlUtil.serialize(outputBuilder.bind {
    mkp.declareNamespace('':'http://www.w3.org/2005/Atom')
    mkp.declareNamespace('creativeCommons':'http://backend.userland.com/creativeCommonsRssModule')
    mkp.declareNamespace('re':'http://purl.org/atompub/rank/1.0')
    mkp.yield root }
)
println result

Results in :

<?xml version="1.0" encoding="UTF-8"?><feed xmlns:creativeCommons="http://backend.userland.com/creativeCommonsRssModule" xmlns="http://www.w3.org/2005/Atom" xmlns:re="http://purl.org/atompub/rank/1.0">
<title type="text">How do I print a groovy Node with namespace preserved? - Stack Overflow </title>
<link rel="self" type="application/atom+xml" href="http://stackoverflow.com/feeds/question/227447"/>
<link rel="alternate" type="text/html" href="http://stackoverflow.com/questions/227447"/>
<subtitle>most recent 30 from stackoverflow.com</subtitle>
<updated>2011-02-16T05:13:17Z</updated>
<id>http://stackoverflow.com/feeds/question/227447</id>
<creativeCommons:license>http://www.creativecommons.org/licenses/by-nc/2.5/rdf</creativeCommons:license>
<entry>
<id>http://stackoverflow.com/questions/227447/how-do-i-print-a-groovy-node-with-namespace-preserved</id>
<re:rank scheme="http://stackoverflow.com">2</re:rank>

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow