Question

I have found a lot of examples to remove element node in an XML file. BUt here is an example for which I didnt find any solution either on stackoverflow or google. For example:

<slide>
    America
    <a> 2 </a>
    <b> 3 </b>
    <c> 4 </c>
</slide>

<slide>
    Germany
    <a> 5 </a>
    <b> 6 </b>
    <c> 7 </c>
</slide>

I would use remove function to delete an element node, since I am using lxml. But now I have to delete "America" and "Germany" which are not actually element nodes but text.

is there a way to remove this like any function??

I am currently using python lxml library.

Output should look like :

 <slide>
     <a> 2 </a>
     <b> 3 </b>
     <c> 4 </c>
 </slide>

 <slide>
     <a> 5 </a>
     <b> 6 </b>
     <c> 7 </c>
 </slide>
Was it helpful?

Solution

Use text property. For example:

html = '''...
<slide>
    America
    <a> 2 </a>
    <b> 3 </b>
    <c> 4 </c>
</slide>

<slide>
    Germany
    <a> 5 </a>
    <b> 6 </b>
    <c> 7 </c>
</slide>
....'''

import lxml.html
root = lxml.html.fromstring(html)
for slide in root.xpath('.//slide'):
    slide.text = ''
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top