Python و Elementtree: إرجاع "Inner XML" باستثناء العنصر الأصل

https://stackoverflow.com/questions/3443831

27-09-2019
|

سؤال

في Python 2.6 باستخدام ElementTree ، ما هي الطريقة الجيدة لجلب XML (كسلسلة) داخل عنصر معين ، مثل ما يمكنك القيام به في HTML و JavaScript innerHTML?

إليك عينة مبسطة من عقدة XML التي أبدأ بها:

<label attr="foo" attr2="bar">This is some text <a href="foo.htm">and a link</a> in embedded HTML</label>

أود أن انتهى الأمر بهذه السلسلة:

This is some text <a href="foo.htm">and a link</a> in embedded HTML

لقد حاولت التكرار على العقدة الأصل وتسلسل tostring() من الأطفال ، لكن ذلك أعطاني فقط المناطق الفرعية:

# returns only subnodes (e.g. <a href="foo.htm">and a link</a>)
''.join([et.tostring(sub, encoding="utf-8") for sub in node])

يمكنني اختراق حل باستخدام تعبيرات منتظمة ، لكنني كنت آمل أن يكون هناك شيء أقل اختراقًا من هذا:

re.sub("</\w+?>\s*?$", "", re.sub("^\s*?<\w*?>", "", et.tostring(node, encoding="utf-8")))

المحلول

ماذا عن:

from xml.etree import ElementTree as ET

xml = '<root>start here<child1>some text<sub1/>here</child1>and<child2>here as well<sub2/><sub3/></child2>end here</root>'
root = ET.fromstring(xml)

def content(tag):
    return tag.text + ''.join(ET.tostring(e) for e in tag)

print content(root)
print content(root.find('child2'))

مما يسبب:

start here<child1>some text<sub1 />here</child1>and<child2>here as well<sub2 /><sub3 /></child2>end here
here as well<sub2 /><sub3 />

نصائح أخرى

يعتمد هذا على الحلول الأخرى ، لكن الحلول الأخرى لم تنجح في حالتي (أسفرت عن استثناءات) وقد نجحت هذه واحدة:

from xml.etree import Element, ElementTree

def inner_xml(element: Element):
    return (element.text or '') + ''.join(ElementTree.tostring(e, 'unicode') for e in element)

استخدمه بنفس الطريقة الموجودة في إجابة مارك تولونين.

ما يلي عمل بالنسبة لي:

from xml.etree import ElementTree as etree
xml = '<root>start here<child1>some text<sub1/>here</child1>and<child2>here as well<sub2/><sub3/></child2>end here</root>'
dom = etree.XML(xml)

(dom.text or '') + ''.join(map(etree.tostring, dom)) + (dom.tail or '')
# 'start here<child1>some text<sub1 />here</child1>and<child2>here as well<sub2 /><sub3 /></child2>end here'

dom.text or '' يستخدم للحصول على النص في بداية root عنصر. إذا لم يكن هناك نص dom.text هو None.

لاحظ أن النتيجة ليست XML صالحة - يجب أن تحتوي XML صالحة على عنصر جذر واحد فقط.

ألق نظرة على مستندات ElementTree حول المحتوى المختلط.

باستخدام Python 2.6.5 ، Ubuntu 10.04

مرخصة بموجب: CC-BY-SA مع الإسناد

لا تنتمي إلى StackOverflow