평평한 구조에서 xpath를 사용하여 노드로 탐색합니다

https://stackoverflow.com/questions/614370

03-07-2019
|

문제

평평한 구조에 XML 파일이 있습니다. 우리는이 XML 파일의 형식을 제어하지 않고이를 처리해야합니다. 필드로 이름이 바뀌었고 도메인이 매우 높고 실제로 문제에 아무런 차이가 없기 때문에 이름을 바꿨습니다.

<attribute name="Title">Book A</attribute>
<attribute name="Code">1</attribute>
<attribute name="Author">
   <value>James Berry</value>
   <value>John Smith</value>
</attribute>
<attribute name="Title">Book B</attribute>
<attribute name="Code">2</attribute>
<attribute name="Title">Book C</attribute>
<attribute name="Code">3</attribute>
<attribute name="Author">
    <value>James Berry</value>
</attribute>

주목할만한 사항 : 파일은 특히 계층 적이 아닙니다. 책은 name = 'title'인 속성 요소의 발생으로 구분됩니다. 그러나 이름 = 'author'속성 노드는 선택 사항입니다.

책 'n'의 저자를 찾는 데 사용할 수있는 간단한 xpath 문이 있습니까? Book 'N'의 제목을 쉽게 식별 할 수 있지만 저자 값은 선택 사항입니다. 그리고 책 2의 경우 책 3의 저자를 줄 것이기 때문에 다음 저자를 취할 수는 없습니다.

나는 이것을 일련의 요소로 구문 분석하기 위해 상태 머신을 작성했지만, 내가 원하는 결과를 직접 얻는 방법이 있다고 생각할 수는 없습니다.

해결책

@Name 'author'의 "속성"요소 '@Name'Title '의 "속성"요소'를 'Book N'의 값으로 따르는 다른 "속성"요소가 @Name 'Title'의 요소 사이에 있습니다. 그들 (그렇다면 저자는 다른 책을 저술했다).

다르게 말하면, 그것은 우리가 저자를 원한다는 것을 의미합니다. 첫 번째 preceding title (the one it "belongs to") is the one we're looking for:

//attribute[@name='Author']
[preceding-sibling::attribute[@name='Title'][1][contains(.,'Book N')]]

n = c => 찾는다 <attribute name="Author"><value>James Berry</value></attribute>

n = b => 아무것도 찾지 못합니다

XSLT 2.0에서 사용 가능한 키 및/또는 그룹화 기능을 사용하면이를 더 쉽게 만들 수 있습니다 (파일이 크면 훨씬 빠릅니다).

(따라서 코드 파서는 '//' '댓글'을 나타내는 것 같지만 xpath에서는 한숨이 없습니다.)

다른 팁

글쎄, 나는 사용했다 요소 트리 위의 XML에서 데이터를 추출합니다. 이 XML을 foo.xml이라는 파일로 저장했습니다

from xml.etree.ElementTree import fromstring

def extract_data():
    """Returns list of dict of book and
    its authors."""

    f = open('foo.xml', 'r+')
    xml = f.read()
    elem = fromstring(xml)
    attribute_list = elem.findall('attribute')
    dic = {}
    lst = []

    for attribute in attribute_list:
        if attribute.attrib['name'] == 'Title':
            key = attribute.text
        if attribute.attrib['name'] == 'Author':
            for v in attribute.findall('value'):
                lst.append(v.text)
            value = lst
            lst = []
            dic[key] = value
    return dic

이 함수를 실행하면 다음과 같습니다.

{'Book A': ['James Berry', 'John Smith'], 'Book C': ['James Berry']}

나는 이것이 당신이 찾고있는 것이기를 바랍니다. 그렇지 않다면 조금 더 지정하십시오. :)

처럼 Bambax 그의 대답에서 XSLT 키를 사용하는 솔루션이 더 효율적입니다., 특히 큰 XML 문서의 경우 :

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes"/>
 <!--                                            -->
 <xsl:key name="kAuthByTitle" 
  match="attribute[@name='Author']"
  use="preceding-sibling::attribute[@name='Title'][1]"/>
 <!--                                            -->
    <xsl:template match="/">
      Book C Author:
      <xsl:copy-of select=
         "key('kAuthByTitle', 'Book C')"/>
  <!--                                            -->
         ====================
      Book B Author:
      <xsl:copy-of select=
         "key('kAuthByTitle', 'Book B')"/>
    </xsl:template>
</xsl:stylesheet>

위의 변환 이이 XML 문서에 적용될 때 :

<t>
    <attribute name="Title">Book A</attribute>
    <attribute name="Code">1</attribute>
    <attribute name="Author">
        <value>James Berry</value>
        <value>John Smith</value>
    </attribute>
    <attribute name="Title">Book B</attribute>
    <attribute name="Code">2</attribute>
    <attribute name="Title">Book C</attribute>
    <attribute name="Code">3</attribute>
    <attribute name="Author">
        <value>James Berry</value>
    </attribute>
</t>

올바른 출력이 생성됩니다.

  Book C Author:
  <attribute name="Author">
    <value>James Berry</value>
</attribute>

     ====================
  Book B Author:

사용하십시오 "//" XPath 약어는 가능한 한 많이 피해야합니다, 일반적으로 XPATH 표현식의 각 평가에서 전체 XML 문서가 스캔됩니다.

모든 제목을 선택하고 템플릿을 적용하십시오

<xsl:template match="/">
  <xsl:apply-templates select="//attribute[@name='Title']"/>
</xsl:template>

템플릿 출력 제목에 다음 제목이 있는지 확인하십시오. 그렇지 않은 경우 저자를 따르는 출력. 존재하는 경우 다음 책의 다음 저자 노드가 현재 책의 다음 저자 노드와 동일한 지 확인하십시오. 그렇다면 현재 책에는 저자가 없음을 의미합니다.

<xsl:template match="*">
   <book>
     <title><xsl:value-of select="."/></title> 
   <author>
   <xsl:if test="not(following::attribute[@name='Title']) or following::attribute[@name='Author'] != following::attribute[@name='Title']/following::attribute[@name='Author']">
   <xsl:value-of select="following::attribute[@name='Author']"/>
   </xsl:if>
   </author>
   </book>
</xsl:template>

나는 당신이 정말로 거기에 가고 싶지 않은지 확실하지 않습니다. 내가 찾은 가장 간단한 것은 저자에게서 가고, 이전 제목을 얻은 다음 첫 번째 저자 나 제목이 실제로 제목인지 확인하는 것이 었습니다. 못생긴!

/books/attribute[@name="Author"]
  [preceding-sibling::attribute[@name="Title" and string()="Book B"]
                               [following-sibling::attribute[ @name="Author" 
                                                             or @name="Title"
                                                            ]
                                 [1]
                                 [@name="Author"]
                               ]
  ][1]

(나는 그것을 추가했다 서적 파일을 감싸는 태그).

나는 libxml2 btw로 그것을 사용하여 그것을 테스트했다 xml_grep2, 그러나 당신이 제공 한 샘플 데이터에서만 더 많은 테스트를 환영합니다).

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow