The standard XML library treats  
and ' '
as equal. There's no way to avoid the equalization if you directly apply fromstring(xml_text)
, and therefore it's impossible to differentiate them then. The only way to stop the escaping is to translate it into something else before apply fromstring()
, and translate it back after then.
import xml.etree.ElementTree
stop_escape = lambda text: text.replace("&#", "|STOP_ESCAPE|")
resume_escape = lambda text: text.replace("|STOP_ESCAPE|", "&#")
xml_text = """
<root>
<mytag>
data_with_space 
</mytag>
</root>"""
root = xml.etree.ElementTree.fromstring(stop_escape(xml_text))
mytag_txt = resume_escape(root.find("mytag").text)
print "original text: ", repr(mytag_txt)
print "stripped text: ", repr(mytag_txt.strip())
You would get:
original text: '\n data_with_space \n '
stripped text: 'data_with_space '