In my opinion, the ElementTree
is a good choice. If you need a bit more capable package in future, you can switch to the third party lxml
module that uses the same interface.
The answer to your problem can be found in the doc http://docs.python.org/3/library/xml.etree.elementtree.html#xml.etree.ElementTree.ElementTree.write
The output is either a string (str) or binary (bytes). This is controlled by the encoding argument. If encoding is "unicode", the output is a string; otherwise, it’s binary. Note that this may conflict with the type of file if it’s an open file object; make sure you do not try to write a string to a binary stream and vice versa.
Basically, you are doing it correctly. You open()
the file in a text mode, this way the file accepts the strings and you neet to use the 'unicode'
argument for the tree.write()
. Otherwise, you could open the file in binary mode (no encoding argument in the open()
) and use the 'utf-8'
in the tree.write()
.
A bit cleaned-up code that works on its own:
#!python3
from xml.etree import ElementTree as et
def dict_to_elem(dictionary):
item = et.Element('Item')
for key in dictionary:
field = et.Element(key.replace(' ',''))
field.text = dictionary[key]
item.append(field)
return item
root = et.Element('AllItems') # create the element first...
tree = et.ElementTree(root) # and pass it to the created tree
root.append(dict_to_elem( {'some_tag':'Hello World', 'xxx': 'yyy'} ))
# Lather, rinse, repeat this append step as needed
filename = 'a.xml'
with open(filename, 'w', encoding='utf-8') as file:
tree.write(file, encoding='unicode')
# The alternative is...
fname = 'b.xml'
with open(fname, 'wb') as f:
tree.write(f, encoding='utf-8')
It depends on the purpose. Of the two, I personally prefer the first solution. It clearly says that you write a text file (and the XML is a text file).
But the simplest alternative where you do not need to tell the encoding is just to pass the file name to the tree.write
like this:
tree.write('c.xml', encoding='utf-8')
It opens the file, writes the content using the given encoding (updated after the Sebastian's comment below), and closes the file. And you can read it easily and you can do no mistake here.