How to detect the root xml element of <?xml version="1.0" encoding="UTF-8"?> using Python and ElementTree

StackOverflow https://stackoverflow.com/questions/21947632

I am parsing an XML file that I expect the root element to be <data>. However, some users have modified these files and added the element <?xml version="1.0" encoding="UTF-8"?> at the top. I want to check to see if that exists and then fail my test to notify the user of this issue. I've tried to do the following but it keeps detecting the proper root element of <data>. Here is what I have so far.

<?xml version="1.0" encoding="UTF-8"?>
<data>
</data>

elementTree = self.param2
root = elementTree.find('.')

print root.tag

What I get to print out is:

data 

(which is not what I expected).

Any ideas would be appreciated!

有帮助吗?

解决方案 2

Well, I appreciate all the responses. However, I didn't want to remove it, I only wanted to detect it and have the user/developer remove it. Here is what I did to detect it.

import re

# The beginning of an XML Declaration to match.
xmlRegex = '(<\\?xml)'
rg = re.compile(xmlRegex, re.IGNORECASE | re.DOTALL)

lineCount = 0
with open("c:\file.xml") as f:
    for line in f:
        lineCount += 1
        match = rg.search(line)

        if match:
            self.assertTrue(False, logger.failed("An XML Declaration was detected on line: " + str(lineCount) + "."))
            else:
                pass

其他提示

If you are using a proper XML API such as xml.dom or ElementTree, you should not have any problem dealing with XML declaration. However, if you still insist on removing the declaration, try this:

from xml.dom import minidom

def remove_xml_declaration(xml_text):
    doc = minidom.parseString(xml_text)
    root = doc.documentElement
    xml_text_without_declaration = root.toxml(doc.encoding)
    return xml_text_without_declaration

#
# Test
#

xml_text = '''<?xml version="1.0" encoding="UTF-8"?>
<data>
</data>
'''

# Remove declaration
xml_text = remove_xml_declaration(xml_text)
print xml_text
print '---'

# Remove declaration, event if it is not there
xml_text = remove_xml_declaration(xml_text)
print xml_text
print '---'
许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top