在 Python 中获取 XML 属性值列表

https://stackoverflow.com/questions/87317

01-07-2019
|

题

我需要从 Python 中的子元素获取属性值列表。

用一个例子来解释是最容易的。

给定一些像这样的 XML：

<elements>
    <parent name="CategoryA">
        <child value="a1"/>
        <child value="a2"/>
        <child value="a3"/>
    </parent>
    <parent name="CategoryB">
        <child value="b1"/>
        <child value="b2"/>
        <child value="b3"/>
    </parent>
</elements>

我希望能够做类似的事情：

>>> getValues("CategoryA")
['a1', 'a2', 'a3']
>>> getValues("CategoryB")
['b1', 'b2', 'b3']

它看起来像是 XPath 的工作，但我愿意接受所有建议。我还想听听您最喜欢的 Python XML 库。

解决方案

我并不是 Python 的老手，但这里有一个使用 libxml2 的 XPath 解决方案。

import libxml2

DOC = """<elements>
    <parent name="CategoryA">
        <child value="a1"/>
        <child value="a2"/>
        <child value="a3"/>
    </parent>
    <parent name="CategoryB">
        <child value="b1"/>
        <child value="b2"/>
        <child value="b3"/>
    </parent>
</elements>"""

doc = libxml2.parseDoc(DOC)

def getValues(cat):
    return [attr.content for attr in doc.xpathEval("/elements/parent[@name='%s']/child/@value" % (cat))]

print getValues("CategoryA")

结果...

['a1', 'a2', 'a3']

其他提示

元素树1.3 （不幸的是，不是 Python 附带的 1.2）支持XPath 像这样：

import elementtree.ElementTree as xml

def getValues(tree, category):
    parent = tree.find(".//parent[@name='%s']" % category)
    return [child.get('value') for child in parent]

然后你可以做

>>> tree = xml.parse('data.xml')
>>> getValues(tree, 'CategoryA')
['a1', 'a2', 'a3']
>>> getValues(tree, 'CategoryB')
['b1', 'b2', 'b3']

lxml.etree （它还提供了 ElementTree 接口）也将以同样的方式工作。

使用标准 W3 DOM，例如 stdlib 的 minidom 或 pxdom：

def getValues(category):
    for parent in document.getElementsByTagName('parent'):
        if parent.getAttribute('name')==category:
            return [
                el.getAttribute('value')
                for el in parent.getElementsByTagName('child')
            ]
    raise ValueError('parent not found')

我必须承认我是一个粉丝 xmltramp 由于其易于使用。

访问上述内容变为：

  import xmltramp

  values = xmltramp.parse('''...''')

  def getValues( values, category ):
    cat = [ parent for parent in values['parent':] if parent(name) == category ]
    cat_values = [ child(value) for child in parent['child':] for parent in cat ]
    return cat_values

  getValues( values, "CategoryA" )
  getValues( values, "CategoryB" )

你可以这样做美丽汤

>>> from BeautifulSoup import BeautifulStoneSoup
>>> soup = BeautifulStoneSoup(xml)
>>> def getValues(name):
. . .      return [child['value'] for child in soup.find('parent', attrs={'name': name}).findAll('child')]

如果您正在使用 HTML/XML，我建议您看看 BeautifulSoup。它与 DOM 树类似，但包含更多功能。

我首选的 python xml 库是 lxml ，它包装了 libxml2。
Xpath 似乎确实是解决这个问题的方法，所以我会这样写：

from lxml import etree

def getValues(xml, category):
    return [x.attrib['value'] for x in 
            xml.findall('/parent[@name="%s"]/*' % category)]

xml = etree.parse(open('filename.xml'))

>>> print getValues(xml, 'CategoryA')
['a1', 'a2', 'a3']
>>> print getValues(xml, 'CategoryB')
['b1', 'b2', 'b3]

在 Python 3.x 中，获取属性列表是使用成员的简单任务 items()

使用 ElementTree, ，下面的代码片段显示了获取属性列表的方法。请注意，此示例不考虑名称空间，如果存在，则需要考虑名称空间。

    import xml.etree.ElementTree as ET

    flName = 'test.xml'
    tree = ET.parse(flName)
    root = tree.getroot()
    for element in root.findall('<child-node-of-root>'):
        attrList = element.items()
        print(len(attrList), " : [", attrList, "]" )

参考：

元素.items()
以（名称，值）对序列的形式返回元素属性。
属性以任意顺序返回。

Python手册

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow