Question

I have a set of files that contain definitions that I want to compile into a single list. The list of files is stored in an XML file that looks like this (this is the input file):

 <report>
 <incident>
 <file>Balance_fields_selected.htm</file>
 </incident>
 <incident>
 <file>Cd_fields.htm</file>
 </incident>
 </report>

Each file specified by a <file> element contains a series of <p class='Term'> elements I need to compile into a single list. Each of these is followed by some arbitrary number of other elements that need to be grouped (I am trying to use keys):

 <html><body>
 <p class="Term">
<a name="Accrued_Bonus_Interest" id="Accrued_Bonus_Interest"></a>Accrued (Bonus Interest)</p>
<p>Bonus Interest Accrued Cycle-to-Date. &#160;Amount of bonus interest that has accrued on the time deposit.</p>
<p>Pages: &#160;View CD Detail.</p>
<p class="Term">
<a name="Accrued_OID" id="Accrued_OID"></a>Accrued (Original Issue Discount)</p>
<p>Original Issue Discount Interest Accrued Year-to-Date. &#160;OID interest accrued in the current year.</p>
<p>Pages: &#160;View CD Detail.</p> 
 </body></html>

The desired result looks something like:

 <topic>
 <title>Arbitrary title</title>
 <body>
 <dl>
 <dlentry id="Accrued_Bonus_Interest">
 <dt>Accrued (Bonus Interest)</dt>
 <dd><p>Bonus Interest Accrued Cycle-to-Date. &#160;Amount of bonus interest that has accrued on the time deposit.</p>
 <p>Pages: View CD Detail</p></dd></dlentry>
 <dlentry id="Accrued_OID">
 <dt>Accrued (Original Issue Amount)</dt>
 <dd><p>Original Issue Discount Interest Accrued Year-to-Date. &#160;OID interest accrued in the current year.</p>
 <p>Pages: View CD Detail</p></dd>
 </dl>
 </body>
 </topic>

I have a stylesheet that accomplishes most of this already -- it looks like I am just (again) lost when it comes to the proper use of keys. The following stylesheet:

    <?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

<xsl:strip-space elements="*" />

<xsl:key name="kFollowing" match="*[not(p[@class='Term'])]" 
    use="generate-id(preceding::p[@class='Term'][1])"/>

<xsl:template match="/">
    <![CDATA[ 
    <!DOCTYPE topic PUBLIC "-//OASIS//DTD DITA Topic//EN" "topic.dtd">  ]]>
    <topic id="data_dictionary">
        <title>IBS Insight Data Dictionary</title>
        <body>
    <dl>
        <xsl:for-each select="/report/incident/file">
            <xsl:for-each select="document(.)/descendant::p[@class='Term']">
                <xsl:variable name="vFollowing" select="key('kFollowing', generate-id())"/> 
                <xsl:element name="dlentry">
                    <xsl:attribute name="id">
                        <xsl:value-of select="child::a/@id"/>
                    </xsl:attribute>
                    <dt><xsl:value-of select="."/></dt>
                    <dd><xsl:value-of select="$vFollowing"/>
                        <xsl:for-each select="following::*[$vFollowing]">
                            <xsl:apply-templates select="."/>
                        </xsl:for-each>
                    </dd>
                </xsl:element>
            </xsl:for-each>
      </xsl:for-each>                
    </dl></body></topic>
</xsl:template>

</xsl:stylesheet>

Correctly crawls the files defined in the input file, and generates a dl with correct dlentry ids and dt elements. The problem is the way my confused implementation of keys is assembling the dd element. It's not as apparent with my sample xml here, but what is happening is that each <p class='Term'> is grabbing all subsequent contnet to populate its <dd> like so:

     <topic>
 <title>Arbitrary title</title>
 <body>
 <dl>
 <dlentry id="Accrued_Bonus_Interest">
 <dt>Accrued (Bonus Interest)</dt>
 <dd>Bonus Interest Accrued Cycle-to-Date. &#160;Amount of bonus interest that has accrued on the time deposit. Pages: View CD DetailAccrued (Original Issue Amount)Original Issue Discount Interest Accrued Year-to-Date. &#160;OID interest accrued in the current year.Pages: View CD Detail</dd></dlentry>
 <dlentry id="Accrued_OID">
 <dt>Accrued (Original Issue Amount)</dt>
 <dd><p>Original Issue Discount Interest Accrued Year-to-Date. &#160;OID interest accrued in the current year.</p>
 <p>Pages: View CD Detail</p></dd>
 </dl>
 </body>
 </topic>

The last item in each file is rendered correctly, but only because there are no more following nodes to process. Something about my code is matching far too many nodes with the keys.

Thanks for looking.

Was it helpful?

Solution

I would change the key definition slightly:

<xsl:key name="kFollowing"
    match="*[not(self::p[@class='Term'])][preceding-sibling::p[@class='Term']" 
    use="generate-id(preceding-sibling::p[@class='Term'][1])"/>

This matches any element that is not itself a <p class="Term"> but which is at the same level in the tree as such an element, and groups these by their nearest preceding "Term". If you want to allow for cases where the content following the <p class="Term"> is just text nodes (i.e. not inside any element) then you need instead

<xsl:key name="kFollowing"
    match="node()[not(self::p[@class='Term'])][preceding-sibling::p[@class='Term']" 
    use="generate-id(preceding-sibling::p[@class='Term'][1])"/>

Then you can simplify the inner for-each to just

        <xsl:for-each select="document(.)/descendant::p[@class='Term']">
            <dlentry id="{a/@id}">
                <dt><xsl:value-of select="."/></dt>
                <dd><xsl:copy-of select="key('kFollowing', generate-id())"/></dd>
            </dlentry>
        </xsl:for-each>
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top