Question

When I query data AS XML from SQL Server it will often generate duplicate XML nodes. Often I can tweak the query to eliminate this, but not always. For those times I can't, I end up with XML such as this:

<Xml>
<House houseId="3" address="123 Main">
    <Dog dogId="13" name="Rover">
        <Flea fleaId="17" name="Chester" />
    </Dog>
    <Dog dogId="13" name="Rover">
        <Flea fleaId="23" name="Poindexter" />            
    </Dog>
</House>
<House houseId="3" address="123 Main">
    <Human humanId="9" name="Mr. Johnson">
        <Child childId="11" name="Susie" />
    </Human>
    <Human humanId="9" name="Mr. Johnson">
        <Child childId="31" name="Sandy" />
    </Human>
</House>
<House houseId="5" address="987 Wall">
    <Dog dogId="13" name="Rover">
        <Flea fleaId="17" name="Chester" />
    </Dog>
    <Dog dogId="13" name="Rover">
        <Flea fleaId="19" name="Wilhelm" />            
    </Dog>
</House>
</Xml>

Notice that there are two <House> nodes next to each other that are identical in their attributes. They differ only in their child nodes. I'm trying to create an XSLT that will take identical sibling nodes, and collapse them into one that contains the superset of all child nodes. In the example, <House houseId="3"> would contain both the <Dog> and <Human> nodes. Like this:

<Xml>
<House houseId="3" address="123 Main">
    <Dog dogId="13" name="Rover">
        <Flea fleaId="17" name="Chester" />
        <Flea fleaId="23" name="Poindexter" />            
    </Dog>
    <Human humanId="9" name="Mr. Johnson">
        <Child childId="11" name="Susie" />
        <Child childId="31" name="Sandy" />
    </Human>
</House>
<House houseId="5" address="987 Wall">
    <Dog dogId="13" name="Rover">
        <Flea fleaId="17" name="Chester" />
        <Flea fleaId="19" name="Wilhelm" />            
    </Dog>
</House>
</Xml>

Not only were the two identical House nodes combined, the duplicate Dog and Human nodes were combined. But notice that the <Dog dogId='13' name='Rover'> node listed under two different <House> nodes are not combined, because they are not identical. (Due to their ancestry.) That's what I'm going for: combining matching sibling nodes.

Because the XML is generated by SQL, the XSLT will deal with nodes of many different names and arrangements. Therefore, I can't hard code the node names. But I can be assured that every node will have a corresponding id attribute that will contain a numerical value. For Example: <House houseId="3">, <Dog dogId="17">, and <Flea fleaId="13">.
I also know that the root node will have no attributes, so I can begin processing nodes that are the child of the root.

My strategy is to create an xsl:key for each Node where the node's key-value is a concatenation of its ancestor nodes with id values. Example key values are in comments below

<Xml>
<House houseId="3" address="123 Main"><!--"houseId=3"-->
    <Dog dogId="13" name="Rover" ><!--"houseId=3;dogId=13"-->
        <Flea fleaId="17" name="Chester" /><!--"houseId=3;dogId=13;fleaId=17"-->
    </Dog>
    <Dog dogId="13" name="Rover" ><!--"houseId=3;dogId=13"-->
        <Flea fleaId="23" name="Poindexter" /><!--"houseId=3;dogId=13;fleaId=23"-->         
    </Dog>
</House>
<House houseId="3" address="123 Main" ><!--"houseId=3"-->
    <Human humanId="9" name="Mr. Johnson" ><!--"houseId=3;humanId=9"-->
        <Child childId="11" name="Susie" /><!--"houseId=3;humanId=9;childId=11"-->
    </Human>
    <Human humanId="9" name="Mr. Johnson"><!--"houseId=3;humanId=9"-->
        <Child childId="31" name="Sandy" /><!--"houseId=3;humanId=9;childId=31"-->
    </Human>
</House>
<House houseId="5" address="987 Wall" ><!--"houseId=5"-->
    <Dog dogId="13" name="Rover"><!--"houseId=5;dogId=13"-->
        <Flea fleaId="17" name="Chester" /><!--"houseId=5;dogId=13;fleaId=17"-->
    </Dog>
    <Dog dogId="13" name="Rover"><!--"houseId=5;dogId=13"-->
        <Flea fleaId="19" name="Wilhelm" /><!--"houseId=5;dogId=13;fleaId=19"-->           
    </Dog>
</House>
</Xml>

So, the two seemingly matching occurrences of <Dog dogId='13' name='Rover'> would be differentiated by their ancestry:

<Xml><House houseId="3"><Dog dogId='13' name='Rover'>

houseId=3;dogId=13

vs.

<Xml><House houseId="5"><Dog dogId='13' name='Rover'>

houseId=5;dogId=13

With this, duplicate (sibling) nodes can be combined. Unfortunately, I'm struggling to understand how to implement this with XSL and the xslt:key. Any help would be greatly appreciated.

Was it helpful?

Solution

This transformation:

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:my="my:my">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:template match="node()|@*">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match="/*" priority="3">
  <xsl:sequence select="my:grouping(., *)"/>
 </xsl:template>

 <xsl:function name="my:grouping" as="element()*">
  <xsl:param name="pElem" as="element()"/>
  <xsl:param name="pChildren" as="element()*"/>

  <xsl:element name="{name($pElem)}" namespace="{namespace-uri($pElem)}">
   <xsl:apply-templates select="$pElem/@*"/>
   <xsl:for-each-group select="$pChildren" group-by="my:signature(.)">
     <xsl:copy>
       <xsl:apply-templates select="@*|node()[not(self::*)]"/>
       <xsl:apply-templates select=
           "my:grouping(., current-group()/*)/*"/>
     </xsl:copy>
   </xsl:for-each-group>
  </xsl:element>
 </xsl:function>

 <xsl:function name="my:signature" as="xs:string">
  <xsl:param name="pElem" as="element()"/>

  <xsl:variable name="vAttibs" as="xs:string*">
   <xsl:perform-sort select="$pElem/@*">
     <xsl:sort select="name()"/>
   </xsl:perform-sort>
  </xsl:variable>
  <xsl:sequence select=
   "string-join((name($pElem)
                 ,for $at in $vAttibs
                   return concat($at, '+', $pElem/@*[name()=$at])
                 )
                  ,'|')"/>
 </xsl:function>
</xsl:stylesheet>

when applied on the provided XML document:

<Xml>
    <House houseId="3" address="123 Main">
        <Dog dogId="13" name="Rover">
            <Flea fleaId="17" name="Chester" />
        </Dog>
        <Dog dogId="13" name="Rover">
            <Flea fleaId="23" name="Poindexter" />
        </Dog>
    </House>
    <House houseId="3" address="123 Main">
        <Human humanId="9" name="Mr. Johnson">
            <Child childId="11" name="Susie" />
        </Human>
        <Human humanId="9" name="Mr. Johnson">
            <Child childId="31" name="Sandy" />
        </Human>
    </House>
    <House houseId="5" address="987 Wall">
        <Dog dogId="13" name="Rover">
            <Flea fleaId="17" name="Chester" />
        </Dog>
        <Dog dogId="13" name="Rover">
            <Flea fleaId="19" name="Wilhelm" />
        </Dog>
    </House>
</Xml>

produces the wanted, correct result:

<Xml>
   <House houseId="3" address="123 Main">
      <Dog dogId="13" name="Rover">
         <Flea fleaId="17" name="Chester"/>
         <Flea fleaId="23" name="Poindexter"/>
      </Dog>
      <Human humanId="9" name="Mr. Johnson">
         <Child childId="11" name="Susie"/>
         <Child childId="31" name="Sandy"/>
      </Human>
   </House>
   <House houseId="5" address="987 Wall">
      <Dog dogId="13" name="Rover">
         <Flea fleaId="17" name="Chester"/>
         <Flea fleaId="19" name="Wilhelm"/>
      </Dog>
   </House>
</Xml>

and with this extended XML document (text nodes added):

<Xml>
    <House houseId="3" address="123 Main">
        <Dog dogId="13" name="Rover">
          Dog named Rover
            <Flea fleaId="17" name="Chester">Regular dog flee</Flea>
        </Dog>
        <Dog dogId="13" name="Rover">
            <Flea fleaId="23" name="Poindexter">Flea named Poindexter</Flea>
        </Dog>
    </House>
    <House houseId="3" address="123 Main">
        <Human humanId="9" name="Mr. Johnson">
            <Child childId="11" name="Susie">Susan Johnson</Child>
        </Human>
        <Human humanId="9" name="Mr. Johnson">
            <Child childId="31" name="Sandy">Sandy Johnson</Child>
        </Human>
    </House>
    <House houseId="5" address="987 Wall">
        <Dog dogId="13" name="Rover">
            <Flea fleaId="17" name="Chester" />
        </Dog>
        <Dog dogId="13" name="Rover">
            <Flea fleaId="19" name="Wilhelm" />
        </Dog>
    </House>
</Xml>

again the correct result is produced:

<Xml>
   <House houseId="3" address="123 Main">
      <Dog dogId="13" name="Rover">
          Dog named Rover
            <Flea fleaId="17" name="Chester">Regular dog flee</Flea>
         <Flea fleaId="23" name="Poindexter">Flea named Poindexter</Flea>
      </Dog>
      <Human humanId="9" name="Mr. Johnson">
         <Child childId="11" name="Susie">Susan Johnson</Child>
         <Child childId="31" name="Sandy">Sandy Johnson</Child>
      </Human>
   </House>
   <House houseId="5" address="987 Wall">
      <Dog dogId="13" name="Rover">
         <Flea fleaId="17" name="Chester"/>
         <Flea fleaId="19" name="Wilhelm"/>
      </Dog>
   </House>
</Xml>

Explanation:

We use two functions: my:signature() and my:grouping():

  1. my:signature() creates a signature for each element -- this is the pipe-separated string of the element name and all attrName+value pairs, sorted by attrName.

  2. my:grouping() uses my:signature() to do correct grouping. It has a second argument, containing the elements to be grouped.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top