Question

I would want to get all text connected with en-todo elements from evernote note on xml. Is there any function to get all tags with specific name from root by one line?

I have tried root.findall("en-todo"), but it returns nothings.

Typical evernote note can contain en-todo elements with different levels of nesting. En-todo tags are often inside span tag, although sometimes the whole stack en-todo and text can be inside that span tag. It is making parsing quite complicated.
Is there any simple way to delete all span tags from xml with keeping its children? Can you suggest any other solutions?

<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE en-note SYSTEM
"http://xml.evernote.com/pub/enml2.dtd">
<en-note>
<div>
<en-todo></en-todo>
<span>one task</span>
</div>

<div><span><br clear="none"/></span></div>

<div><span>
<en-todo></en-todo>
<span>second task</span>
<br clear="none"/>
<span><en-todo></en-todo>third task</span>
<br clear="none"/>
<span><en-todo></en-todo>forth task</span>
<br clear="none"/>
<span><en-todo></en-todo>fifth task</span>
</span></div>
...
Was it helpful?

Solution

Get it by xpath:

root.findall('.//en-todo')

This will find you all en-todo elements at any level of nesting.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top