Frage

I have the following problematic code:

import xml.etree.ElementTree as ET    

def main():
    tree = ET.parse('D:/Developer/Work/oDesk1/assets/test.xml')
    root = tree.getroot()
    print root.findall('.//country')

if __name__ == '__main__':
    main()

I have the following XML:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="recipes.xsl"?>
<data xmlns="urn:Test" xmlns:xs="http://www.w3.org/2001/XMLSchema" name="Galaxy" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:Test test.xsd">
    <country name="Liechtenstein">
        <rank updated="yes">2</rank>
        <year>2008</year>
        <gdppc>141100</gdppc>
        <neighbor name="Austria" direction="E"/>
        <neighbor name="Switzerland" direction="W"/>
    </country>
    <country name="Singapore">
        <rank updated="yes">5</rank>
        <year>2011</year>
        <gdppc>59900</gdppc>
        <neighbor name="Malaysia" direction="N"/>
    </country>
    <country name="Panama">
        <rank updated="yes">69</rank>
        <year>2011</year>
        <gdppc>13600</gdppc>
        <neighbor name="Costa Rica" direction="W"/>
        <neighbor name="Colombia" direction="E"/>
    </country>
</data>

Problem:

root.findall('.//country') returns an empty list.

Expected:

root.findall('.//country') should return a populated list.

Changing the XML data to the following resolves my problem.

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="recipes.xsl"?>
<data>
    <country name="Liechtenstein">
        <rank updated="yes">2</rank>
        <year>2008</year>
        <gdppc>141100</gdppc>
        <neighbor name="Austria" direction="E"/>
        <neighbor name="Switzerland" direction="W"/>
    </country>
    <country name="Singapore">
        <rank updated="yes">5</rank>
        <year>2011</year>
        <gdppc>59900</gdppc>
        <neighbor name="Malaysia" direction="N"/>
    </country>
    <country name="Panama">
        <rank updated="yes">69</rank>
        <year>2011</year>
        <gdppc>13600</gdppc>
        <neighbor name="Costa Rica" direction="W"/>
        <neighbor name="Colombia" direction="E"/>
    </country>
</data>

I am unable to understand this undefined behaviour. I am aware that the previous XML points to some XML schema. I want to bypass the XML schema information without editing the XML file such that I get a populated list when I call root.findall() function. How am I supposed to do that? Also, please explain this weird behaviour with the two slightly different XMLs.

War es hilfreich?

Lösung

etree just doesn't deal with namespaces well. You'll need to include it yourself, with its own syntax:

root.findall('.//{urn:Test}country')

will return the expected elements.

findall takes an optional namespaces argument, but it doesn't seem to work for 'empty' implicit namespaces

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top