So after pointed out by @pandubear, the XML:
<data>foo <data1>hello</data1> bar</data>
Does have two text nodes, containing "foo " and " bar", so what can be done is to iterate through all the child nodes in data and get the values.
Question
I have a nested XML that looks like this:
<data>foo <data1>hello</data1> bar</data>
I am using minidom, but no matter how I try to get the values between "data", I am only get "foo" but not "bar"
It is even worse if the XML is like this:
<data><data1>hello</data1> bar</data>
I only get a "None", which is correct according to the logic above. So I came accross this: http://levdev.wordpress.com/2011/07/29/get-xml-element-value-in-python-using-minidom and concluded that it is due to the limitation of minidom?
So I used the method in that blog and I now get
foo <data1>hello</data1> bar
and
<data1>hello</data1> bar
which is acceptable. However, if I try to create a new node (createTextNode) using the output above as node values, the XML becomes:
<data>foo <data1>hello</data1> bar</data>
and
<data><data1>hello</data1> bar</data>
Is there any way that I can create it so that it looks like the original? Thank you.
La solution 3
So after pointed out by @pandubear, the XML:
<data>foo <data1>hello</data1> bar</data>
Does have two text nodes, containing "foo " and " bar", so what can be done is to iterate through all the child nodes in data and get the values.
Autres conseils
You can use element tree For xml it very efficient for both retrieval and creation of the node
have a look at the link below
element tree-- tutorials mixed xml
someof the examples of creating node
import xml.etree.ElementTree as ET
data = ET.Element('data')
data1= ET.SubElement(data, 'data1',attr="value")
data1.text="hello"
data.text="bar"
data1.tail="some code"
ET.dump(data)
output :<data>bar<data1 attr="value">hello</data1>some code</data>
Use the following function to prettify your xml so it is a LOT easier to see...first of all..
import xml.dom.minidom as minidom
def prettify(elem):
"""Return a pretty-printed XML string for the Element. Props goes
to Maxime from stackoverflow for this code."""
rough_string = et.tostring(elem, 'utf-8')
reparsed = minidom.parseString(rough_string)
return reparsed.toprettyxml(indent="\t")
That makes stepping through the tree visually a lot simpler.
Next I would suggest a modification in your xml that will make your life a whole lot easier i think.
Instead of :
<data>foo
<data1>hello</data1>
bar
</data>
which is not a correct XML format I would save your 'foo' and 'bar' as attributes of
it looks like this:
<data var1='foo' var2='bar'>
<data1>hello</data1>
</data>
to do this using xml.etree.ElementTree:
import xml.etree.ElementTree as ET
data = ET.Element('data', {'var1:'foo', 'var2':'bar'})
data1= ET.SubElement(data, 'data1')
data1.text='hello'
print prettify(data)