XSLT: Copy nodes from multiple XML files, replace or add nodes to another XML file and transform the whole content to HTML

StackOverflow https://stackoverflow.com/questions/21291198

Question

What is all about?

We need to create an xslt file, which will perform the following acts. The whole story is about legislation. Think about what's going on with legislative modifications. You have an old law X and legislator bring a new law Y, which has modifications referring to law X. A modification can replace any kind of subpart of a law (section, article, paragraph etc).


Law X

Article 1

  1. SOMETHING REALLY OLD HERE
  2. blah blah blah

Article 2 [....]


Law Y

Article 1

  1. blahblahblah

Article 2

  1. Replace the 1st paragraph of the 1st article in Law Y as: "SOMETHING FRESH AND AWESOME".

And what you really want to know?


Law X'

Article 1

  1. SOMETHING FRESH AND AWESOME
  2. blah blah blah

Article 2 [....]


Let's move to XML legislative world

We have XML file X like this

<Legislation DocumentURI="X">
<Metadata>[...]</Metadata>
<Introduction>[...]</Introduction>
<Body>
<Article DocumentURI="X/article/1">
<P DocumentURI="X/article/1/paragraph/1">
   <P2 DocumentURI="X/article/1/paragraph/1/passage/1">
            <Text>SOMETHING REALLY OLD HERE</Text>
   </P2>
</P>
<P DocumentURI="X/article/1/paragraph/2">
   <P2 DocumentURI="X/article/1/paragraph/2/passage/1">
            <Text>blahblahblah</Text>
   </P2>
</P>
</Article>
</Body>
</Legislation>

We have XML file Y like this

<Legislation DocumentURI="Y">
<Metadata>[...]</Metadata>
<Introduction>[...]</Introduction>
<Body>
<Article DocumentURI="Y/article/1">
<P DocumentURI="Y/article/1/paragraph/1">
   <P2 DocumentURI="Y/article/1/paragraph/1/passage/1">
            <Text>blahblahblah</Text>
   </P2>
</P>
</Article>
<Article DocumentURI="Y/article/2">
<P DocumentURI="Y/article/2/paragraph/1">
   <P2 DocumentURI="Y/article/2/paragraph/1/passage/1">
            <Text>Replace the 1st paragraph of the 1st article in Law Y as:</Text>
   </P2>
   <Modification DocumentURI="Y/modification/1">
     <P2 URI="X/article/1/paragraph/1/passage/1">
        <Text>SOMETHING FRESH ANS AWESOME</Text>
     </P2>
   </Modification>
</P>
</Article>
</Body>
</Legislation>

And we'll produce X':

<Legislation DocumentURI="X">
<Metadata>[...]</Metadata>
<Introduction>[...]</Introduction>
<Body>
<Article DocumentURI="X/article/1">
<P DocumentURI="X/article/1/paragraph/1">
   <P2 DocumentURI="X/article/1/paragraph/1/passage/1">
            <Text>SOMETHING FRESH AND AWESOME</Text>
   </P2>
</P>
<P DocumentURI="X/article/1/paragraph/2">
   <P2 DocumentURI="X/article/1/paragraph/2/passage/1">
            <Text>blahblahblah</Text>
   </P2>
</P>
</Article>
</Body>
</Legislation>

TIP: <Modification> content can be any kind of subpart of law. Maybe a single XML element like this above, maybe a whole nested node like a whole article!

TIP2: Who told you that <Modification> to the same element occurs only once? Maybe in 2012 a law Y modifies a specific part, and then in 2013 a new law Z modifies the already modified part in a different way or a subpart of this part!

TIP3: Who told you that <Modification> refers to an element, which is already in file X? Maybe a law Y add a new part in the old law X!

How we know where are the modifications?

  1. We know their DocumentURI attribute values.
  2. We know the XML files, in which they are standing.

Before using XSLT, an action (findmodifications) is called. The action uses wildcard (*) from URL, which represents law's id, and search in a rdfstore for modifications. In the end we set a request parameter called modifications, to return results.

The modifications' parameter has the following structure:

modification[0][0] = PD201210.xml  
modification[0][1] = http://localhost:8888/GRLegislation/pd/2012/10/modification/1  
modification[1][0] = PD201210.xml  
modification[1][1] = http://localhost:8888/GRLegislation/pd/2012/10/modification/2  
.  
.  
.

And after all what you want to do with the modified XML content X'?

I use Apache Cocoon Framework for my web application project, so I have a specific pipeline in my sitemap, which is calling an action in order to find where are the modifications and then take the X file, act xslt transformations and serialize it as HTML.

<map:pipeline id="law-updated">
      <map:match pattern="pd/*/updated">
        <map:act type="findmodifications">      
        </map:act>
        <map:generate src="{1}.xml" type="file"/>
        <map:transform src="legislation_updated.xslt" type="xslt"/>
        <map:serialize type="xhtml"/>
      </map:match>
</map:pipeline>

My XSLT template for HTML transformations is here:

<xsl:template match="/">
    <html>
        <head>
        </head>
        <body>
            <div id="wrapper">
                <div id="header">
                        ............
                </div>
                <div id="content">
                    <div id="content_column">
                        <div class="button" >
                            <a>
                                <xsl:attribute name="href">
                                    <xsl:value-of select="Legislation/Metadata/dc:identifier"/>/data.rdf
                                </xsl:attribute>RDF</a>
                        </div><br/><br/>
                        <div class="button" >
                            <a style="color: #fff;">
                                <xsl:attribute name="href">
                                    <xsl:value-of select="Legislation/Metadata/dc:identifier"/>/data.pdf
                                </xsl:attribute>PDF</a>
                        </div><br/>

                    </div>
                    <div id="content_body">
                        <div id="content_text">
                          <div id="content_bar">
                                 <ul>
                                    <li style="font-size: 16px; font-weight: bold; float:left; padding-right:2em;">
                                        <a><xsl:attribute name="href">
                                            <xsl:value-of select="Legislation/Metadata/dc:identifier"/>/contents
                                        </xsl:attribute>Περιεχόμενα</a>
                                    </li>
                                    <li style="font-size: 16px; font-weight: bold; float:left; padding-right:2em;">
                                        <a><xsl:attribute name="href">
                                            <xsl:value-of select="Legislation/Metadata/dc:identifier"/>
                                        </xsl:attribute>Κείμενο</a>
                                    </li>
                                    <li style="font-size: 16px; font-weight: bold; float:left; padding-right:2em;">
                                        <a><xsl:attribute name="href">
                                            <xsl:value-of select="Legislation/Metadata/dc:identifier"/>/timeline
                                        </xsl:attribute>Χρονολόγιο</a>
                                    </li>
                                    <li style="font-size: 16px; font-weight: bold; float:left; padding-right:2em;">
                                        <a><xsl:attribute name="href">
                                            <xsl:value-of select="Legislation/Metadata/dc:identifier"/>/citations
                                        </xsl:attribute>Παραπομπές</a>
                                    </li>
                                </ul>
                                </div>
                                <br/>
                                <br/>
                        <table border="0">
                        <tr><td>ΦΕΚ:&#160;</td> 
                            <td><xsl:value-of select="Legislation/Metadata/diavgeia:fek/issue"/>
                        / <xsl:value-of select="Legislation/Metadata/diavgeia:fek/year"/>
                        / <xsl:value-of select="Legislation/Metadata/diavgeia:fek/fekNumber"/>
                        </td></tr>
                        <tr>
                            <td>ΚΩΔΙΚΟΣ ΑΠΟΦΑΣΗΣ:&#160;</td>
                            <td> <xsl:value-of select="Legislation/Metadata/diavgeia:decisionType/diavgeia:label"/>
                            &#160; <xsl:value-of select="Legislation/Metadata/Year"/>
                            / <xsl:value-of select="Legislation/Metadata/Number"/>
                            </td>
                        </tr>
                        <tr>
                            <td>ΗΜΕΡΟΜΗΝΙΑ ΕΚΔΟΣΗΣ:&#160;</td> 
                            <td>
                                <xsl:value-of select="Legislation/Metadata/dc:created"/>

                            </td>

                        </tr>
                        <tr>
                            <td>ΥΠΟΓΡΑΦΗ:&#160;</td> 
                            <td>
                                <xsl:for-each select="Legislation/Metadata/diavgeia:signer">
                                <a><xsl:attribute name="href">http://localhost:8888/GRLegislation/signer/<xsl:value-of select="@uid"/></xsl:attribute>
                                <xsl:value-of select="diavgeia:firstName"/>&#160;
                                <xsl:value-of select="diavgeia:lastName"/>
                                </a>
                                <xsl:choose>
                                <xsl:when test="position() != last()">,&#160;</xsl:when>
                                </xsl:choose>
                                </xsl:for-each>
                            </td>

                        </tr>
                         <tr>
                            <td>ΣΧΕΤΙΚΑ ΜΕ:&#160;</td> 
                            <td>
                                <xsl:for-each select="Legislation/Metadata/diavgeia:tag">
                                <xsl:value-of select="diavgeia:label"/>
                                <xsl:choose>
                                <xsl:when test="position() != last()">,&#160;</xsl:when>
                                </xsl:choose>
                               </xsl:for-each>
                            </td>

                        </tr>
                        </table>
                        <br/>
                        <h4>
                            <xsl:value-of select="Legislation/Metadata/dc:title"/>
                        </h4>

                                                 <br/>
                                                 <xsl:for-each select="Legislation/Body/Article">
                                                    <div><xsl:attribute name="id"><xsl:value-of select="@id"/></xsl:attribute>
                                                    <p>
                                                        <h5>Άρθρο
                                                            <xsl:value-of select="Number"/>
                                                        </h5>
                                                        <xsl:if test="Title">
                                                        <h6>
                                                            <xsl:value-of select="Title"/>
                                                         </h6>
                                                         </xsl:if>
                                                        <br/>
                                                        <ol>
                                                        <xsl:for-each select="P">
                                                            <div><xsl:attribute name="id">
                                                                <xsl:value-of select="@DocumentURI"/>
                                                                </xsl:attribute>
                                                            <li>
                                                             <xsl:for-each select="P2">
                                                                 <xsl:value-of select="Text"/>&#160;
                                                             </xsl:for-each>
                                                             <xsl:for-each select="List">
                                                                 <ol style="list-style-type:lower-greek">
                                                                 <xsl:for-each select="Case">
                                                                     <li>
                                                                         <xsl:value-of select="Text"/>
                                                                     </li>
                                                                 </xsl:for-each>
                                                                 </ol>
                                                             </xsl:for-each>
                                                             <xsl:for-each select="Modification">

                                                                <div id="modification" >
                                                                 <xsl:if test="P">
                                                                    <ol>
                                                                    <xsl:for-each select="P">
                                                                        <li>
                                                                            <xsl:attribute name="value">
                                                                            <xsl:value-of select="Number"/>
                                                                            </xsl:attribute>
                                                                         <xsl:for-each select="P2">
                                                                             <xsl:value-of select="Text"/>&#160;
                                                                         </xsl:for-each>
                                                                         <xsl:for-each select="List">
                                                                             <ol style="list-style-type:lower-greek">
                                                                             <xsl:for-each select="Case">
                                                                                 <li>
                                                                                     <xsl:value-of select="Text"/>
                                                                                 </li>
                                                                             </xsl:for-each>
                                                                             </ol>
                                                                         </xsl:for-each>
                                                                        </li> 
                                                                    </xsl:for-each>
                                                                    </ol>
                                                                 </xsl:if>
                                                                 <xsl:if test="P2">
                                                                        <xsl:for-each select="P2">
                                                                        <xsl:value-of select="Text"/>&#160;
                                                                        </xsl:for-each>
                                                                 </xsl:if>
                                                                 <xsl:if test="Case">
                                                                     <ol>   
                                                                    <xsl:for-each select="Case">
                                                                     <li><xsl:attribute name="value">
                                                                            <xsl:value-of select="Value"/>
                                                                            </xsl:attribute>
                                                                         <xsl:value-of select="Text"/>
                                                                     </li>
                                                                    </xsl:for-each>
                                                                     </ol>
                                                                 </xsl:if>
                                                                </div>
                                                             </xsl:for-each>
                                                            </li> 
                                                            </div>                                                           
                                                        </xsl:for-each>
                                                        </ol>
                                                    </p>
                                                    </div>
                                                 </xsl:for-each>
                    </div>
                    </div>
                </div>
        </body>
    </html>
  </xsl:template>

In case you need extra information please ask. A real XML X file is here. A real XML Y file, which has modification for X, is here. The modified XML will be like this here Thanks in advance!

MODIFICATION TYPES:

  1. Replacement of a node, like modifications 2,3,4 of real XML file Y.
  2. Addition of a node, like modification 1 of real XML file Z.
  3. Deletion of a node. Let's say the modification for deletion will have the structure of an empty node. Like <Modification DocumentURI=""><P2 DocumentURI=""></P2></Modification>
Was it helpful?

Solution

After some days, I found completely incidental solution, in the last part of my question. We have 2 different XSLT stylesheets:

  • One for HTML representation (see question above), and
  • the stylesheet above (see answer) from Tomalak, which handle the modifications

As I mentioned we had to adjust modifications in our X XML file and then publish the modified X' in HTML. So Apache Cocoon give as the opportunity to implement these 2 steps, as follows:

<map:pipeline id="law-updated">
      <map:match pattern="pd/*/updated">
        <map:act type="findmodifications">      
        </map:act>
        <map:generate src="{1}.xml" type="file"/>
        <!-- Tomalak stylesheet -->
        <map:transform src="legislation_updated.xslt" type="xslt"/>
        <!-- HTML stylesheet -->
        <map:transform src="html.xslt" type="xslt"/>
        <map:serialize type="xhtml"/>
      </map:match>
</map:pipeline>

It's as simple as that, but it isn't mentioned anywhere in documentation.

Remaining issue: Before using XSLT, an action (findmodifications) is called. The action uses wildcard (*) from URL, which represents law's id, and search in a rdfstore for modifications. In the end we set a request parameter called modifications, to return results.

In Tomalak's stylesheet, modifications' are given inline (see the first 5 lines). We want to use the request parameter.

The modifications' parameter has the following structure:

modification[0][0] = PD201210.xml  
modification[0][1] = http://localhost:8888/GRLegislation/pd/2012/10/modification/1  
modification[1][0] = PD201210.xml  
modification[1][1] = http://localhost:8888/GRLegislation/pd/2012/10/modification/2  
.  
.  
.

OTHER TIPS

Here is a stylesheet that might start you off.

It is designed to merge one document (the "old state") with another document (the "new state").

  • It does not touch the document header/metadata. This is taken from the "old state". If you want to use something else, write an appropriate <xsl:template match="Metadata"> etc.
  • It runs through all nodes of the "old state". If there is a <Modification> in the "new state" that refers to any "old state" node, the content of the modification is copied.
  • Otherwise, the old state is copied.
  • It assumes all additions from the "new state" are appended to what already exists. Because otherwise, they would simply be changes, right?
  • It's parameter based. If you want to apply several modifications, do so step by step, using the output of the previous run as the input for the next.

I have tested with the real XML files and your sample.

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output indent="yes" />

  <xsl:param name="modificationsDoc" select="''" />

  <xsl:variable name="existingURIs" select="//@DocumentURI" />
  <xsl:variable name="mods" select="document($modificationsDoc)//Modification/*" />

  <xsl:template match="node() | @*" name="identity">
    <xsl:copy>
      <xsl:apply-templates select="node() | @*" />
    </xsl:copy>
  </xsl:template>

  <!-- only elements with DocumentURI can be target of modifications -->
  <xsl:template match="*[@DocumentURI]">
    <xsl:variable name="currURI" select="@DocumentURI" />
    <xsl:variable name="matchingMod" select="$mods[@DocumentURI = $currURI]" />

    <!-- replacements and deletions -->
    <xsl:choose>
      <xsl:when test="$matchingMod">
        <!-- no output for empty modifications (they are deletions) -->
        <xsl:if test="$matchingMod[*]">
          <xsl:copy-of select="$matchingMod" />
        </xsl:if>
      </xsl:when>
      <xsl:otherwise>
        <xsl:call-template name="identity" />
      </xsl:otherwise>
    </xsl:choose>

    <!-- additions -->
    <xsl:variable name="siblingURI">
      <xsl:call-template name="substring-before-last">
        <xsl:with-param name="string1" select="$currURI" />
        <xsl:with-param name="string2" select="'/'" />
      </xsl:call-template>
      <xsl:text>/</xsl:text>
    </xsl:variable>
    <xsl:copy-of select="$mods[
      number(substring-after(@DocumentURI, $siblingURI)) &gt; 0
      and not(@DocumentURI = $existingURIs)
    ]" />
  </xsl:template>

  <xsl:template name="substring-before-last">
    <xsl:param name="string1" select="''" />
    <xsl:param name="string2" select="''" />

    <xsl:if test="$string1 != '' and $string2 != ''">
      <xsl:variable name="head" select="substring-before($string1, $string2)" />
      <xsl:variable name="tail" select="substring-after($string1, $string2)" />
      <xsl:value-of select="$head" />
      <xsl:if test="contains($tail, $string2)">
        <xsl:value-of select="$string2" />
        <xsl:call-template name="substring-before-last">
          <xsl:with-param name="string1" select="$tail" />
          <xsl:with-param name="string2" select="$string2" />
        </xsl:call-template>
      </xsl:if>
    </xsl:if>
  </xsl:template>

</xsl:stylesheet>

With your sample XML, the output is as desired:

<Legislation DocumentURI="X">
  <Metadata>[...]</Metadata>
  <Introduction>[...]</Introduction>
  <Body>
    <Article DocumentURI="X/article/1">
      <P DocumentURI="X/article/1/paragraph/1">
        <P2 DocumentURI="X/article/1/paragraph/1/passage/1">
          <Text>SOMETHING FRESH ANS AWESOME</Text>
        </P2>
      </P>
      <P DocumentURI="X/article/1/paragraph/2">
        <P2 DocumentURI="X/article/1/paragraph/2/passage/1">
          <Text>blahblahblah</Text>
        </P2>
      </P>
    </Article>
  </Body>
</Legislation>

Notes:

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top