Parsing xml with etree Python

https://stackoverflow.com//questions/22007185

21-12-2019
|

Question

for this xml

<locations>

    <location>
        <locationid>1</locationid>
        <homeID>281</homeID>
        <buildingType>Added</buildingType>
        <address>A</address>
        <address2>This is address2</address2>
        <city>This is city/city>
        <state>State here</state>
        <zip>1234</zip>
    </location>
    <location>
        <locationid>2</locationid>
        <homeID>81</homeID>
        <buildingType>Added</buildingType>
        <address>B</address>
        <address2>This is address2</address2>
        <city>This is city/city>
        <state>State here</state>
        <zip>1234</zip>
    </location>
    .
    .
    .
    .
    <location>
        <locationid>10</locationid>
        <homeID>21</homeID>
        <buildingType>Added</buildingType>
        <address>Z</address>
        <address2>This is address2</address2>
        <city>This is city/city>
        <state>State here</state>
        <zip>1234</zip>
    </location>
</locations>

How can i get locationID for the address A , Using etree.

Here is my code ,

import urllib2
import lxml.etree as ET

url="url for the xml"
xmldata = urllib2.urlopen(url).read()
# print xmldata
root = ET.fromstring(xmldata)
for target in root.xpath('.//location/address[text()="A"]'):
    print target.find('LocationID')

Getting output as None , Whats wrong i am doing here ?

Solution

First of all, your xml is not well-formed. You should take more care when posting it and try to avoid to other users to fix your data.

You can search for the preceding sibling, like:

import urllib2
import lxml.etree as ET

url="..."
xmldata = urllib2.urlopen(url).read()
root = ET.fromstring(xmldata)
for target in root.xpath('.//location/address[text()="A"]'):                                                                                                  
    for location in [e for e in target.itersiblings(preceding=True) if e.tag == "locationid"]:                                                                
        print location.text

Or do it directly from the xpath expression, like:

import urllib2
import lxml.etree as ET

url="..."
xmldata = urllib2.urlopen(url).read()
root = ET.fromstring(xmldata)
print root.xpath('.//location/address[text()="A"]/preceding-sibling::locationid/text()')[0]

Run either of them like:

python2 script.py

That yield:

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow