의 목록을 얻을 XML 특성 값에서는 파이썬

https://stackoverflow.com/questions/87317

01-07-2019
|

문제

내가 목록을 얻어야의 특성 값을 아는 요소에서 Python.

그것은 쉬운로 설명하는 예입니다.

주어진 일부 XML 다음과 같다:

<elements>
    <parent name="CategoryA">
        <child value="a1"/>
        <child value="a2"/>
        <child value="a3"/>
    </parent>
    <parent name="CategoryB">
        <child value="b1"/>
        <child value="b2"/>
        <child value="b3"/>
    </parent>
</elements>

나는 할 수 있어야 할 뭔가 다음과 같:

>>> getValues("CategoryA")
['a1', 'a2', 'a3']
>>> getValues("CategoryB")
['b1', 'b2', 'b3']

그것처럼 보이는 작업에 대한 XPath 그러나 나는 모두에게 열려 권장 사항입니다.해 알아보려면 아래에서 언어에 대해 듣고 당신의 마음에 드는 파이썬 XML 라이브러리입니다.

해결책

나는 정말 오래된에서 손 Python,하지만 여기에 XPath 솔루션을 사용하여 libxml2.

import libxml2

DOC = """<elements>
    <parent name="CategoryA">
        <child value="a1"/>
        <child value="a2"/>
        <child value="a3"/>
    </parent>
    <parent name="CategoryB">
        <child value="b1"/>
        <child value="b2"/>
        <child value="b3"/>
    </parent>
</elements>"""

doc = libxml2.parseDoc(DOC)

def getValues(cat):
    return [attr.content for attr in doc.xpathEval("/elements/parent[@name='%s']/child/@value" % (cat))]

print getValues("CategoryA")

결과...

['a1', 'a2', 'a3']

다른 팁

ElementTree1.3 (불행하게도 없습 1.2 는 하나의 포함되어 있 Python) 지원 XPath 다음과 같다:

import elementtree.ElementTree as xml

def getValues(tree, category):
    parent = tree.find(".//parent[@name='%s']" % category)
    return [child.get('value') for child in parent]

다음 할 수 있습니다

>>> tree = xml.parse('data.xml')
>>> getValues(tree, 'CategoryA')
['a1', 'a2', 'a3']
>>> getValues(tree, 'CategoryB')
['b1', 'b2', 'b3']

lxml.etree (또한 제공합 ElementTree 인터페이스)는 것입니다 또한 작업에서 동일한 방법입니다.

를 사용하여 표준 W3DOM 등 stdlib 의 minidom,또는 pxdom:

def getValues(category):
    for parent in document.getElementsByTagName('parent'):
        if parent.getAttribute('name')==category:
            return [
                el.getAttribute('value')
                for el in parent.getElementsByTagName('child')
            ]
    raise ValueError('parent not found')

나는 나의 팬 xmltramp 으로 인해 그것의 사용의 용이성이다.

액세스하기 위 된다:

  import xmltramp

  values = xmltramp.parse('''...''')

  def getValues( values, category ):
    cat = [ parent for parent in values['parent':] if parent(name) == category ]
    cat_values = [ child(value) for child in parent['child':] for parent in cat ]
    return cat_values

  getValues( values, "CategoryA" )
  getValues( values, "CategoryB" )

이 수행할 수 있습니다 BeautifulSoup

>>> from BeautifulSoup import BeautifulStoneSoup
>>> soup = BeautifulStoneSoup(xml)
>>> def getValues(name):
. . .      return [child['value'] for child in soup.find('parent', attrs={'name': name}).findAll('child')]

만약 당신이 작업 HTML/XML 것이 좋습니다 당신을 살펴 BeautifulSoup.그것은 유사한 DOM 트리지만 포함합니다.

내가 선호하는 파이썬 xml library lxml 는 랩 libxml2.
Xpath 같은 방법으로 여기므로 내가 쓸 것으로 이것을 다음과 같습니다.

from lxml import etree

def getValues(xml, category):
    return [x.attrib['value'] for x in 
            xml.findall('/parent[@name="%s"]/*' % category)]

xml = etree.parse(open('filename.xml'))

>>> print getValues(xml, 'CategoryA')
['a1', 'a2', 'a3']
>>> print getValues(xml, 'CategoryB')
['b1', 'b2', 'b3]

Python3.x,페치의 속성을 목록으로는 간단한 작업을 사용하는 회원 items()

를 사용하는 ElementTree, 아래의 조각을 보여줍을 얻을 수 있는 방법 목록의 특성이 있습니다.이 예제는 고려하지 않는 네임스페이스,어떤 존재하는 경우,필요한 것을 차지한다.

    import xml.etree.ElementTree as ET

    flName = 'test.xml'
    tree = ET.parse(flName)
    root = tree.getroot()
    for element in root.findall('<child-node-of-root>'):
        attrList = element.items()
        print(len(attrList), " : [", attrList, "]" )

참고:

요소입니다.항목()
는 요소를 반환성의 순서로(name,value)쌍이다.
특성에 반환되는 임의의 순서입니다.

Python 설명서

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow