フラット構造でxpathを使用してノードに移動する

https://stackoverflow.com/questions/614370

03-07-2019
|

質問

フラット構造のxmlファイルがあります。このxmlファイルの形式を制御するのではなく、処理するだけです。これらのフィールドはドメイン固有であり、実際に問題に影響を与えないため、フィールドの名前を変更しました。

<attribute name="Title">Book A</attribute>
<attribute name="Code">1</attribute>
<attribute name="Author">
   <value>James Berry</value>
   <value>John Smith</value>
</attribute>
<attribute name="Title">Book B</attribute>
<attribute name="Code">2</attribute>
<attribute name="Title">Book C</attribute>
<attribute name="Code">3</attribute>
<attribute name="Author">
    <value>James Berry</value>
</attribute>

重要な点：ファイルは特に階層的ではありません。ブックは、name = 'Title'の属性要素の出現によって区切られます。ただし、name = 'Author'属性ノードはオプションです。

「n」という本の著者を見つけるために使用できる単純なxpathステートメントはありますか？書籍のタイトル「n」は簡単に識別できますが、著者の値はオプションです。また、次の著者を連れて行くことはできません。本2の場合、本3の著者になります。

これを一連の要素として解析するステートマシンを作成しましたが、必要な結果を直接取得する方法があるとは思わずにはいられません。

解決

<！> quot; attribute <！> quot;が必要です。 <！> quot; attribute <！> quotに続く@name 'Author'の要素。 'Book n'の値を持ち、他の<！> quot; attribute <！> quotがない@name 'Title'の要素。それらの間の@name 'Title'の要素（ある場合は、著者が他の本を執筆したため）。

別の言い方をすると、 first の前のタイトル（<！> quot; belongs to <！> quot;）の著者が必要であることを意味します探しているもの：

//attribute[@name='Author']
[preceding-sibling::attribute[@name='Title'][1][contains(.,'Book N')]]

N = C = <！> gt; <attribute name="Author"><value>James Berry</value></attribute>

を見つけます

N = B = <！> gt;何も見つかりません

XSLT 2.0で使用可能なキーやグループ化機能を使用すると、これが簡単になります（ファイルが大きい場合ははるかに高速になります）。

（SOコードパーサーは '//'は 'comments'を表しているように見えますが、XPathではそうではありません!!!ため息です。）

他のヒント

まあ、 Elementtree を使用して、上記のXMLからデータを抽出しました。このXMLをfoo.xmlという名前のファイルに保存しました

from xml.etree.ElementTree import fromstring

def extract_data():
    """Returns list of dict of book and
    its authors."""

    f = open('foo.xml', 'r+')
    xml = f.read()
    elem = fromstring(xml)
    attribute_list = elem.findall('attribute')
    dic = {}
    lst = []

    for attribute in attribute_list:
        if attribute.attrib['name'] == 'Title':
            key = attribute.text
        if attribute.attrib['name'] == 'Author':
            for v in attribute.findall('value'):
                lst.append(v.text)
            value = lst
            lst = []
            dic[key] = value
    return dic

この関数を実行すると、次のようになります：

{'Book A': ['James Berry', 'John Smith'], 'Book C': ['James Berry']}

これがあなたが探しているものであることを望みます。そうでない場合は、もう少し指定してください。：）

bambax が回答で指摘したように、XSLTキーを使用したソリューションは、特に大きなXML文書に対してより効率的です。

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes"/>
 <!--                                            -->
 <xsl:key name="kAuthByTitle" 
  match="attribute[@name='Author']"
  use="preceding-sibling::attribute[@name='Title'][1]"/>
 <!--                                            -->
    <xsl:template match="/">
      Book C Author:
      <xsl:copy-of select=
         "key('kAuthByTitle', 'Book C')"/>
  <!--                                            -->
         ====================
      Book B Author:
      <xsl:copy-of select=
         "key('kAuthByTitle', 'Book B')"/>
    </xsl:template>
</xsl:stylesheet>

上記の変換がこのXMLドキュメントに適用される場合：

<t>
    <attribute name="Title">Book A</attribute>
    <attribute name="Code">1</attribute>
    <attribute name="Author">
        <value>James Berry</value>
        <value>John Smith</value>
    </attribute>
    <attribute name="Title">Book B</attribute>
    <attribute name="Code">2</attribute>
    <attribute name="Title">Book C</attribute>
    <attribute name="Code">3</attribute>
    <attribute name="Author">
        <value>James Berry</value>
    </attribute>
</t>

正しい出力が生成されます：

  Book C Author:
  <attribute name="Author">
    <value>James Berry</value>
</attribute>

     ====================
  Book B Author:

XPath式の各評価で通常はXMLドキュメント全体がスキャンされるため、"//" XPath略語の使用はできる限り避ける必要があることに注意してください。

すべてのタイトルを選択してテンプレートを適用

<xsl:template match="/"> <xsl:apply-templates select="//attribute[@name='Title']"/> </xsl:template>

テンプレートの出力タイトルで、次のタイトルが存在するかどうかを確認します。そうでない場合は、次の著者を出力します。存在する場合、次のブックの次の著者ノードが現在のブックの次の著者ノードと同じかどうかを確認します。もしそうなら、それは現在の本に著者がいないことを意味します：

<xsl:template match="*"> <book> <title><xsl:value-of select="."/></title> <author> <xsl:if test="not(following::attribute[@name='Title']) or following::attribute[@name='Author'] != following::attribute[@name='Title']/following::attribute[@name='Author']"> <xsl:value-of select="following::attribute[@name='Author']"/> </xsl:if> </author> </book> </xsl:template>

あなたが本当にそこに行きたいと思うかどうかはわかりません。私が見つけた最も簡単な方法は、著者から行って前のタイトルを取得し、次に最初の著者またはタイトルが実際にタイトルであることを確認することでした。 glyい！

/books/attribute[@name="Author"] [preceding-sibling::attribute[@name="Title" and string()="Book B"] [following-sibling::attribute[ @name="Author" or @name="Title" ] [1] [@name="Author"] ] ][1]

（ファイルをラップするために books タグを追加しました）。

xml_grep2 を使用してlibxml2 BTWでテストしましたが、指定したサンプルデータのみでしたので、より多くのテストを歓迎します）。

ライセンス： CC-BY-SA と帰属

所属していません StackOverflow