Question

I am trying to select the value of the <li> ancestor node just previous to the parent node. Here is a sample of the document, im-trg.xml:

<trg>
<category>
    <h2>Accounting and Auditing</h2>
    <ul>
    <li>Laws and Regulations
        <ul>
            <li><a href="url1">Regulation S-X</a></li>
        </ul>
    </li>
    <li>Staff Guidance
        <ul>
            <li>No Action Letters
                <ul>
                    <li><a href="url2">Robert Van Grover, Esq., Seward and Kissel LLP</a> (November 5, 2013)</li>
                </ul>
            </li>
        </ul>
    </li>
    </ul>
</category>
</trg>

Here is my query:

for $x in doc("C:\im-trg.xml")//li/a
return 
<item>
<title>{data($x)}</title>
<documentType>{data($x/ancestor::li[2])}</documentType>
<category>{data($x/ancestor::category/h2)}</category>
</item>

I am getting:

<item>
  <title>Regulation S-X</title>
  <documentType>Laws and RegulationsRegulation S-X</documentType>
  <category>Accounting and Auditing</category>
</item>

For <documentType>, I want to select only the ancestor <li> immediately previous to the <li> parent of the <a>, which indicates the type of document, so I want:

<item>
  <title>Regulation S-X</title>
  <documentType>Laws and Regulations</documentType>
  <category>Accounting and Auditing</category>
</item>

and

<item>
  <title>Robert Van Grover, Esq., Seward and Kissel LLP</title>
  <documentType>No Action Letters</documentType>
  <category>Accounting and Auditing</category>
</item>

I don't think I can come down from the root because the parent <li> is sometimes double nested and sometimes triple nested.

Was it helpful?

Solution

Text value of an element is the concatenation of all its text-node descendants. If you only want the text immediately contained by the element, you should explicitly select its text children, eg

data($x/ancestor::li[2]/text())
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top