parse .xml with prefix's on tags? xml.etree.ElementTree
Domanda
I can read tags, except when there is a prefix. I'm not having luck searching SO for a previous question.
I need to read media:content
. I tried image = node.find("media:content")
.
Rss input:
<channel>
<title>Popular Photography in the last 1 week</title>
<item>
<title>foo</title>
<media:category label="Miscellaneous">photography/misc</media:category>
<media:content url="http://foo.com/1.jpg" height="375" width="500" medium="image"/>
</item>
<item> ... </item>
</channel>
I can read a sibling tag title
.
from xml.etree import ElementTree
with open('cache1.rss', 'rt') as f:
tree = ElementTree.parse(f)
for node in tree.findall('.//channel/item'):
title = node.find("title").text
I've been using the docs, yet stuck on the 'prefix' part.
Soluzione
Here's an example of using XML namespaces with ElementTree:
>>> x = '''\
<channel xmlns:media="http://www.w3.org/TR/html4/">
<title>Popular Photography in the last 1 week</title>
<item>
<title>foo</title>
<media:category label="Miscellaneous">photography/misc</media:category>
<media:content url="http://foo.com/1.jpg" height="375" width="500" medium="image"/>
</item>
<item> ... </item>
</channel>
'''
>>> node = ElementTree.fromstring(x)
>>> for elem in node.findall('item/{http://www.w3.org/TR/html4/}category'):
print elem.text
photography/misc
Altri suggerimenti
media
is an XML namespace, it has to be defined somewhere earlier with xmlns:media="..."
. See http://lxml.de/xpathxslt.html#namespaces-and-prefixes for how to define xml namespaces for use in XPath expressions in lxml.
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow