Question

What would you use to alter an XML-file while preserving as much as possible of layout, including indentation and comments?

My problem is that I have a couple of massive hand-edited XML-files describing a user interface, and now I need to translate several attributes to another language.

I've tried doing this using Python + ElementTree, but it did not preserve neither whitespace nor comments.

I've seen XSLT being suggested for similar questions, but I don't think that is an alternative in this case, since I need to do some logic and lookups for each attribute.

It would be preferable if attribute order in each element is preserved as well, but I can tolerate changed order.

Was it helpful?

Solution

Any DOM manipulation module should suite your needs. Layout is just a text data, so it's represented as text nodes in DOM:

>>> from xml.dom.minidom import parseString
>>> dom = parseString('''\
... <message>
...   <text>
...     Hello!
...   </text>
... </message>''')
>>> dom.childNodes[0].childNodes
[<DOM Text node "u'\n  '">, <DOM Element: text at 0xb765782c>, <DOM Text node "u'\n'">]
>>> text = dom.getElementsByTagName('text')[0].childNodes[0]
>>> text.data = text.data.replace(u'Hello', u'Hello world')
>>> print dom.toxml()
<?xml version="1.0" ?><message>
  <text>
    Hello world!
  </text>
</message>

OTHER TIPS

If you use an XSLT processor such as xt, then you can write extension methods in Java that can perform any arbitrary transformation you need.

Having said that, I have used Python's xml.dom.minidom module successfully for this sort of transformation. It does preserve whitespace and layout.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top