Question

Here's my XML:

<beans>
    <property name = "type1">
        <list>
            <bean class = "bean1">
                <property name = "typeb">
                    <value>foo</value>
                </property>
            </bean>
            <bean class = "bean2">
                <property name ="typeb">
                    <value>bar</value>
                </property>
            </bean>
        </list>
    </property>

    <property name = "type2">
        <list>
            <bean class = "bean3">
                <list>
                    <property name= "typec">
                        <sometags/>
                    </property>
                    <property name= "typed">
                        <list>
                            <value>foo</value>
                            <value>bar</bar>
                        </list>
                    </property> 
               </list>


            </bean>
        </list>
    </property>
</beans>

Now what we're want to do is scan through this and delete these elements:

            <bean class = "bean1">
                <property = "typeb">
                    <value>foo</value>
                </property>
            </bean>

And:

            <value>foo</value>

(from the property class = "typed" element).

Now to achieve this, what I'd like to do is something like this:

for element in root.iter('value'):
    if element.text == 'foo':
        p1= element.getParent()
        if p1.tag == 'list': #second case scenario, remove just the value tag. 
            p1.remove(element)
        else: #first case scenario - remove entire bean
            p2 = p1.getParent()
            p3 = p2.getParent()
            p3.remove(p2)

However ElementTree doesn't support an child seeing its parent element.

What would an effective way to achieve this be? Given that it is a deep XML structure, I don't quite like the idea of a recursive function that checks the tag types at each level.

Was it helpful?

Solution 2

Here's how I solved it:

#gives you a list of every parent,child tuple
def iterparent(tree):
    for parent in tree.getiterator():
        for child in parent:
            yield parent, child

#recursive function. Deletes the given child node, from n parents back. 
#If n = 0 it deletes just the child. 
def removeParent(root, childToRemove, n):

    for parent, child in iterparent(root):
        if (childToRemove == child):
            if n>0:
                removeParent(root, parent, n-1)
            else: 
                parent.remove(child)


for parent, child in iterparent(root):
    if (child.tag == 'value' and (child.text in valuesToDelete):
        if (parent.tag == 'list'):
            removeParent(root, child, 0)
        else:
            removeParent(root, child, 2)    

It's actually quite elegant. I like it.

For my purposes, this works well, but one might have difficulty with a wide range of element structures and depths.

OTHER TIPS

with ElementTree, use parent to find relevant child:

>>> parent = root.find('.//bean[@class="bean1"]')
>>> parent
<Element 'bean' at 0x10eb31550>
>>> parent.find('.//value').text
'foo'

The lxml.etree module has a getparent method. Given your example XML (well, after fixing the mismatched closing tag), I can do this:

>>> from lxml import etree
>>> 
>>> with open('data.xml') as fd:
...     doc = etree.parse(fd)
... 
>>> matches = doc.xpath('//value[text()="foo"]')
>>> element = matches[0]
>>> etree.tostring(element)
'<value>foo</value>\n        '
>>> parent = element.getparent()
>>> print etree.tostring(element)
<value>foo</value>

>>> parent = element.getparent()
>>> print etree.tostring(parent)
<property name="typeb">
          <value>foo</value>
        </property>
>>> parent = parent.getparent()
>>> print etree.tostring(parent)
<bean class="bean1">
        <property name="typeb">
          <value>foo</value>
        </property>
      </bean>

..and so forth.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top