I have some XML, a fragment of which looks like:

<osgb:departedMember>
<osgb:DepartedFeature fid='osgb4000000024942964'>
<osgb:boundedBy>
<gml:Box srsName='osgb:BNG'>
<gml:coordinates>188992.575,55981.029 188992.575,55981.029</gml:coordinates>
</gml:Box>
</osgb:boundedBy>
<osgb:theme>Road Network</osgb:theme>
<osgb:reasonForDeparture>Deleted</osgb:reasonForDeparture>
<osgb:deletionDate>2014-02-19</osgb:deletionDate>
</osgb:DepartedFeature>
</osgb:departedMember>

I am parsing it with:

departedmembers = doc_root.findall('{http://www.ordnancesurvey.co.uk/xml/namespaces/osgb}departedMember')
for departedMember in departedMembers:
    findWhat='{http://www.ordnancesurvey.co.uk/xml/namespaces/osgb}DepartedFeature'
    fid = int(departedmember.find(findWhat).attrib['fid'].replace('osgb', ''))
    theme=departedmember[0].findall('{http://www.ordnancesurvey.co.uk/xml/namespaces/osgb}theme')[0].text    
    reason=departedmember[0].findall('{http://www.ordnancesurvey.co.uk/xml/namespaces/osgb}reasonForDeparture')[0].text
    date=departedmember[0].findall('{http://www.ordnancesurvey.co.uk/xml/namespaces/osgb}deletionDate')[0].text

Occasionally either the reason or the date or both are empty, ie, the element is missing, not just has empty content. This is legitimate according to the XSD, but I get attribute errors trying to select the text of a non-existent element. To deal with that I have put the reason and date lines in try, except blocks, like:

try:
    date=departedmember[0].findall('{http://www.ordnancesurvey.co.uk/xml/namespaces/osgb}deletionDate')[0].text
except:
    pass

This works, but I hate to use except/pass like this, so it led me to wondering if there is a nicer way to parse a document like this where some elements are optional.

有帮助吗?

解决方案

Since you are interested only in the first element of findall, you can replace findall(x)[0] with find(x). Besides, if you want to avoid try/except blocks, you can use ternary.

departedmembers = doc_root.findall('{http://www.ordnancesurvey.co.uk/xml/namespaces/osgb}departedMember')
for departedMember in departedMembers:
    ...
    date = departedmember[0].find('{http://www.ordnancesurvey.co.uk/xml/namespaces/osgb}deletionDate')
    date = None if date == None else date.text # Considering you want to set the element to None if it was not found

其他提示

Yes, the issue is not the searching method, rather the referencing of the returning elements when there are none. You can write your code like this:

results = departedmember[0].findall('{http://www.ordnancesurvey.co.uk/xml/namespaces/osgb}deletionDate')

if results:
    date = results[0].text
else:
    # there is no element,
    # do what you want in this case
许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top