Python e ElementTree: ritorno “XML interiore” escluso elemento padre

https://stackoverflow.com/questions/3443831

27-09-2019
|

Domanda

In Python 2.6 utilizzando elementtree, che cosa è un buon modo per andare a prendere l'XML (come una stringa) all'interno di un particolare elemento, come quello che si può fare in HTML e JavaScript con innerHTML ?

Ecco un esempio semplificato del nodo XML sto iniziando con:

<label attr="foo" attr2="bar">This is some text <a href="foo.htm">and a link</a> in embedded HTML</label>

mi piacerebbe finire con questa stringa:

This is some text <a href="foo.htm">and a link</a> in embedded HTML

Ho provato iterare sul nodo padre e concatenando il tostring() dei figli, ma che mi ha dato solo le sotto-nodi:

# returns only subnodes (e.g. <a href="foo.htm">and a link</a>)
''.join([et.tostring(sub, encoding="utf-8") for sub in node])

posso incidere su una soluzione che utilizza le espressioni regolari, ma speravo ci sarebbe qualcosa di meno hacky di questo:

re.sub("</\w+?>\s*?$", "", re.sub("^\s*?<\w*?>", "", et.tostring(node, encoding="utf-8")))

Soluzione

Come su:

from xml.etree import ElementTree as ET

xml = '<root>start here<child1>some text<sub1/>here</child1>and<child2>here as well<sub2/><sub3/></child2>end here</root>'
root = ET.fromstring(xml)

def content(tag):
    return tag.text + ''.join(ET.tostring(e) for e in tag)

print content(root)
print content(root.find('child2'))

Con conseguente:

start here<child1>some text<sub1 />here</child1>and<child2>here as well<sub2 /><sub3 /></child2>end here
here as well<sub2 /><sub3 />

Altri suggerimenti

Questa è basata su altre soluzioni, ma le altre soluzioni non ha funzionato nel mio caso (provocato eccezioni) e questo ha funzionato:

from xml.etree import Element, ElementTree

def inner_xml(element: Element):
    return (element.text or '') + ''.join(ElementTree.tostring(e, 'unicode') for e in element)

Usa allo stesso modo come in di Mark Tolonen risposta .

Quanto segue ha funzionato per me:

from xml.etree import ElementTree as etree
xml = '<root>start here<child1>some text<sub1/>here</child1>and<child2>here as well<sub2/><sub3/></child2>end here</root>'
dom = etree.XML(xml)

(dom.text or '') + ''.join(map(etree.tostring, dom)) + (dom.tail or '')
# 'start here<child1>some text<sub1 />here</child1>and<child2>here as well<sub2 /><sub3 /></child2>end here'

dom.text or '' viene utilizzato per ottenere il testo all'inizio dell'elemento root. Se non c'è dom.text testo None.

Si noti che il risultato non è una valida XML -. Un XML valido dovrebbe avere un solo elemento radice

Date un'occhiata alle elementtree documentazione circa il contenuto misto .

Usare Python 2.6.5, Ubuntu 10.04

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow