Question

Im traversing a XML tree and im having some troubles by extracting a node from the tree leaving their inner nodes.

For example:

<xml>
    <letter name="B">
        <letter name="D">
            <letter name="E">
                <letter name="F">
                    <letter name="G">

                    </letter>
                </letter>
            </letter>
        </letter>
    </letter>
</xml>

I need something like this:

<xml>
    <letter name="B">
        <letter name="D">
                <letter name="F">
                    <letter name="G">

                    </letter>
                </letter>
        </letter>
    </letter>
</xml>

But i cant get this with out removing all E childs.

Cheers!

Was it helpful?

Solution

The idea is to find the letter element with name="E", get it's parent, remove the element from parent and extend the parent with element's children:

import xml.etree.ElementTree as etree

data = """
<xml>
    <letter name="B">
        <letter name="D">
            <letter name="E">
                <letter name="F">
                    <letter name="G">

                    </letter>
                </letter>
            </letter>
        </letter>
    </letter>
</xml>
"""

XPATH = './/letter[@name="E"]'

tree = etree.fromstring(data)
letter = tree.find(XPATH)
parent = tree.find(XPATH + '/..')

parent.remove(letter)
parent.extend(letter)

print etree.tostring(tree)

It prints:

<xml>
    <letter name="B">
        <letter name="D">
            <letter name="F">
                    <letter name="G">

                    </letter>
                </letter>
            </letter>
    </letter>
</xml>

UPD (using iterative approach):

def iterparent(tree):
    for parent in tree.getiterator():
        for child in parent:
            yield parent, child

tree = etree.fromstring(data)
for parent, child in iterparent(tree):
    if child.tag == "letter" and child.attrib.get('name') == "E":
        parent.remove(child)
        parent.extend(child)

print etree.tostring(tree)

iterparent() function is taken from Accessing Parents paragraph from docs.

OTHER TIPS

Another thing,

Is possible to do something like this??

Initial XML

<xml>
    <letter name="B">
        <letter name="D">
            <letter name="E">
                <letter name="F">
                    <letter name="G">

                    </letter>
                </letter>
            </letter>
            <letter name="H">
                <letter name="I">

                </letter>
            </letter>
        </letter>
    </letter>
</xml>

Then have as the output a list with two trees, something like this:

<xml>
    <letter name="B">
        <letter name="E">
            <letter name="F">
                <letter name="G">

                </letter>
            </letter>
        </letter>
    </letter>
</xml>


<xml>
    <letter name="B">
            <letter name="H">
                <letter name="I">

                </letter>
            </letter>
    </letter>
</xml>

As you can see @falsetru and @alecxe, i just deleted D and leave only one child per tree.

Thanks!!!!

I just finished to do it, i just needed to copy the tree before the deletion, otherwise the original object will be modified..

Here is the solution. By the way!, Thanks a lot!!!! XD

def remove_letter(tree_original, letter):
    tree= copy.deepcopy(tree_original)
    for parent in tree.getiterator():
        for child in parent:
            if child.attrib.get('name') == letter:
                parent.remove(child)
                parent.extend(child)
                print etree.tostring(parent)
                return parent   

def get_next_trees(tree):
    my_trees = []
    for parent in tree.getiterator():
        if child.attrib.get('name') == "D":
            for child in parent:
                my_trees.append(remove_letter(tree)
            return my_trees 
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top