Question

I have a nested XML that looks like this:

<data>foo <data1>hello</data1> bar</data>

I am using minidom, but no matter how I try to get the values between "data", I am only get "foo" but not "bar"

It is even worse if the XML is like this:

<data><data1>hello</data1> bar</data>

I only get a "None", which is correct according to the logic above. So I came accross this: http://levdev.wordpress.com/2011/07/29/get-xml-element-value-in-python-using-minidom and concluded that it is due to the limitation of minidom?

So I used the method in that blog and I now get

foo <data1>hello</data1> bar

and

<data1>hello</data1> bar

which is acceptable. However, if I try to create a new node (createTextNode) using the output above as node values, the XML becomes:

<data>foo &lt;data1&gt;hello&lt;/data1&gt; bar</data>

and

<data>&lt;data1&gt;hello&lt;/data1&gt; bar</data>

Is there any way that I can create it so that it looks like the original? Thank you.

Was it helpful?

Solution 3

So after pointed out by @pandubear, the XML:

<data>foo <data1>hello</data1> bar</data>

Does have two text nodes, containing "foo " and " bar", so what can be done is to iterate through all the child nodes in data and get the values.

OTHER TIPS

You can use element tree For xml it very efficient for both retrieval and creation of the node

have a look at the link below

element tree-- tutorials mixed xml

someof the examples of creating node

import xml.etree.ElementTree as ET

  data = ET.Element('data')

data1= ET.SubElement(data, 'data1',attr="value")
data1.text="hello"
data.text="bar"
data1.tail="some code"
ET.dump(data)

output :<data>bar<data1 attr="value">hello</data1>some code</data>

Use the following function to prettify your xml so it is a LOT easier to see...first of all..

import xml.dom.minidom as minidom

def prettify(elem):
    """Return a pretty-printed XML string for the Element.  Props goes
    to Maxime from stackoverflow for this code."""
    rough_string = et.tostring(elem, 'utf-8')
    reparsed = minidom.parseString(rough_string)
    return reparsed.toprettyxml(indent="\t")

That makes stepping through the tree visually a lot simpler.

Next I would suggest a modification in your xml that will make your life a whole lot easier i think.

Instead of :

<data>foo
    <data1>hello</data1>
    bar
</data>

which is not a correct XML format I would save your 'foo' and 'bar' as attributes of

it looks like this:

<data var1='foo' var2='bar'>
    <data1>hello</data1>
</data>

to do this using xml.etree.ElementTree:

import xml.etree.ElementTree as ET

data = ET.Element('data', {'var1:'foo', 'var2':'bar'})
data1= ET.SubElement(data, 'data1')
data1.text='hello'
print prettify(data)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top