XML Tree parsing with condition in Python

https://stackoverflow.com/questions/22720490

23-06-2023
|

Question

Here is my XML structure:

<images>
  <image>
<name>brain tumer</name>
<location>images/brain_tumer1.jpg</location>
<annotations>
    <comment>
        <name>Patient 0 Brain Tumer</name>
        <description>
            This is a tumer in the brain
        </description>
    </comment>
</annotations>
</image>
<image>
<name>brain tumer</name>
<location>img/brain_tumer2.jpg</location>
<annotations>
    <comment>
        <name>Patient 1 Brain Tumer</name>
        <description>
            This is a larger tumer in the brain
        </description>
    </comment>
</annotations>
</image>
</images>

I am new to Python and wanted to know if retrieving the location data based on the comment:name data was posible? In other words here is my code:

for itr1 in itemlist :
            commentItemList = itr1.getElementsByTagName('name')

            for itr2 in commentItemList:
                if(itr2.firstChild.nodeValue == "Patient 1 Liver Tumer"):
                    commentName = itr2.firstChild.nodeValue
                    Loacation = it1.secondChild.nodeValue

Any recommendations or am i missing somthing here? Thank you in advance.

Solution

Parsing xml with minidom isn't fun at all, but here's the idea:

iterate over all image nodes
for each node, check comment/name text
if the text matches, get the location node's text

Example that finds location for Patient 1 Brain Tumer comment:

import xml.dom.minidom

data = """
your xml goes here
"""

dom = xml.dom.minidom.parseString(data)
for image in dom.getElementsByTagName('image'):
    comment = image.getElementsByTagName('comment')[0]
    comment_name_text = comment.getElementsByTagName('name')[0].firstChild.nodeValue
    if comment_name_text == 'Patient 1 Brain Tumer':
        location =  image.getElementsByTagName('location')[0]
        print location.firstChild.nodeValue

prints:

img/brain_tumer2.jpg

OTHER TIPS

Just to compare the easiness of solutions, here's how you can do the same with lxml:

from lxml import etree

data = """
your xml goes here
"""

root = etree.fromstring(data)
print root.xpath('//image[.//comment/name = "Patient 1 Brain Tumer"]/location/text()')[0]

prints:

img/brain_tumer2.jpg

Basically, one line vs six.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow