Domanda

I have the following problematic code:

import xml.etree.ElementTree as ET    

def main():
    tree = ET.parse('D:/Developer/Work/oDesk1/assets/test.xml')
    root = tree.getroot()
    print root.findall('.//country')

if __name__ == '__main__':
    main()

I have the following XML:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="recipes.xsl"?>
<data xmlns="urn:Test" xmlns:xs="http://www.w3.org/2001/XMLSchema" name="Galaxy" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:Test test.xsd">
    <country name="Liechtenstein">
        <rank updated="yes">2</rank>
        <year>2008</year>
        <gdppc>141100</gdppc>
        <neighbor name="Austria" direction="E"/>
        <neighbor name="Switzerland" direction="W"/>
    </country>
    <country name="Singapore">
        <rank updated="yes">5</rank>
        <year>2011</year>
        <gdppc>59900</gdppc>
        <neighbor name="Malaysia" direction="N"/>
    </country>
    <country name="Panama">
        <rank updated="yes">69</rank>
        <year>2011</year>
        <gdppc>13600</gdppc>
        <neighbor name="Costa Rica" direction="W"/>
        <neighbor name="Colombia" direction="E"/>
    </country>
</data>

Problem:

root.findall('.//country') returns an empty list.

Expected:

root.findall('.//country') should return a populated list.

Changing the XML data to the following resolves my problem.

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="recipes.xsl"?>
<data>
    <country name="Liechtenstein">
        <rank updated="yes">2</rank>
        <year>2008</year>
        <gdppc>141100</gdppc>
        <neighbor name="Austria" direction="E"/>
        <neighbor name="Switzerland" direction="W"/>
    </country>
    <country name="Singapore">
        <rank updated="yes">5</rank>
        <year>2011</year>
        <gdppc>59900</gdppc>
        <neighbor name="Malaysia" direction="N"/>
    </country>
    <country name="Panama">
        <rank updated="yes">69</rank>
        <year>2011</year>
        <gdppc>13600</gdppc>
        <neighbor name="Costa Rica" direction="W"/>
        <neighbor name="Colombia" direction="E"/>
    </country>
</data>

I am unable to understand this undefined behaviour. I am aware that the previous XML points to some XML schema. I want to bypass the XML schema information without editing the XML file such that I get a populated list when I call root.findall() function. How am I supposed to do that? Also, please explain this weird behaviour with the two slightly different XMLs.

È stato utile?

Soluzione

etree just doesn't deal with namespaces well. You'll need to include it yourself, with its own syntax:

root.findall('.//{urn:Test}country')

will return the expected elements.

findall takes an optional namespaces argument, but it doesn't seem to work for 'empty' implicit namespaces

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top