Element Tree: How to parse subElements of child nodes

https://stackoverflow.com/questions/21531111

06-10-2022
|

Question

I have an XML tree, which I'd like to parse using Elementtree. My XML looks something like

<?xml version="1.0" encoding="UTF-8"?>
<GetOrdersResponse xmlns="urn:ebay:apis:eBLBaseComponents">
<Ack>Success</Ack>
<Version>857</Version>
<Build>E857_INTL_APIXO_16643800_R1</Build>
<PaginationResult>
    <TotalNumberOfPages>1</TotalNumberOfPages>
    <TotalNumberOfEntries>2</TotalNumberOfEntries>
</PaginationResult>
<HasMoreOrders>false</HasMoreOrders>
<OrderArray>
    <Order>
        <OrderID>221362908003-1324471823012</OrderID>
        <CheckoutStatus>
            <eBayPaymentStatus>NoPaymentFailure</eBayPaymentStatus>
            <LastModifiedTime>2014-02-03T12:08:51.000Z</LastModifiedTime>
            <PaymentMethod>PaisaPayEscrow</PaymentMethod>
            <Status>Complete</Status>
            <IntegratedMerchantCreditCardEnabled>false</IntegratedMerchantCreditCardEnabled>
        </CheckoutStatus>
    </Order>
    <Order> ...
    </Order>
    <Order> ...
    </Order>
</OrderArray>
</GetOrdersResponse>

I want to parse the 6th child of the XML () I am able to get the value of subelements by index. E.g if I want OrderID of first order, i can use root[5][0][0].text. But, I would like to get the values of subElements by name. I tried the following code, but it does not print anything:

tree = ET.parse('response.xml')
root = tree.getroot()
for child in root:
    try:
        for ids in child.find('Order').find('OrderID'):
            print ids.text
    except:
        continue

Could someone please help me on his. Thanks

Solution

Since the XML document has a namespace declaration (xmlns="urn:ebay:apis:eBLBaseComponents"), you have to use universal names when referring to elements in the document. For example, you need {urn:ebay:apis:eBLBaseComponents}OrderID instead of just OrderID.

This snippet prints all OrderIDs in the document:

from xml.etree import ElementTree as ET

NS = "urn:ebay:apis:eBLBaseComponents"

tree = ET.parse('response.xml')

for elem in tree.iter("*"):    # Use tree.getiterator("*") in Python 2.5 and 2.6
    if elem.tag == '{%s}OrderID' % NS:
        print elem.text

See http://effbot.org/zone/element-namespaces.htm for details about ElementTree and namespaces.

OTHER TIPS

Try to avoid chaining your finds. If your first find does not find anything, it will return None.

for child in root:
    order = child.find('Order')
    if order is not None:
        ids = order.find('OrderID')
        print ids.text

You can find an OrderArray first and then just iterate its children by name:

tree = ET.parse('response.xml')
root = tree.getroot()
order_array = root.find("OrderArray")
for order in order_array.findall('Order'):
    order_id_element = order.find('OrderID')
    if order_id_element is not None:
        print order_id_element.text

A side note. Never ever use except: continue. It hides any exception you get and makes debugging really hard.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow