Question

I'm using python 2.6.2's xml.etree.cElementTree to create an xml document:

import xml.etree.cElementTree as etree
elem = etree.Element('tag')
elem.text = (u"Würth Elektronik Midcom").encode('utf-8')
xml = etree.tostring(elem,encoding='UTF-8')

At the end of the day, xml looks like:

<?xml version='1.0' encoding='UTF-8'?>
<tag>W&#195;&#188;rth Elektronik Midcom</tag>

It looks like tostring ignored the encoding parameter and encoded 'ü' into some other character encoding ('ü' is a valid utf-8 encoding, I'm fairly sure).

Any advice as to what I'm doing wrong would be greatly appreciated.

Was it helpful?

Solution

You're encoding the text twice. Try this:

import xml.etree.cElementTree as etree
elem = etree.Element('tag')
elem.text = u"Würth Elektronik Midcom"
xml = etree.tostring(elem,encoding='UTF-8')

OTHER TIPS

etree.tostring(elem, encoding=str)

will return str but not binary in Python 3

You can also serialise to a Unicode string without declaration by passing the unicode function as encoding (or str in Py3), or the name 'unicode'. This changes the return value from a byte string to an unencoded unicode string.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top