Question

I have some XML like this:

<TEI>
  <text>
    <div type="scene" n="1">
      <sp xml:id="sp1">
        <speaker>Julius</speaker>
        <l>Lorem ipsum dolor sit amet</l>
        <ptr cRef="..." />
        <stage>Aside</stage>
        <ptr cRef="..." />
        <l>consectetur adipisicing elit</l>
        <stage>To Antony</stage>
        <l>sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.</l>
      </sp>
      <sp xml:id="sp2">
        ...

And I need to lift all the <stage> elements up one level to become siblings of the <sp>s, breaking the <sp>s up so that the <stage> elements retain their preceding and following relations with the other elements inside the <sp>, e.g.

<TEI>
  <text>
    <div type="scene" n="1"> 
     <sp by="#Julius">
       <l>Lorem ipsum dolor sit amet</l>
       <ptr cRef="..." />
     </sp>
     <stage>Aside</stage>
     <sp by="#Julius">
       <ptr cRef="..." />
       <l>consectetur adipisicing elit</l>
     </sp>
     <stage>To Antony</stage>
     <sp by="#Julius">
       <l>sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.</l>
     </sp>

I've been working on an XSLT to do this. It includes a recursive template which is intended to consume all the child elements of an <sp> up to (but not including) the first <stage> child and emit them in the result tree as children of a new <sp>. Then emit the first <stage> element. And then recurse on all the elements following that first <stage> element. Eventually, when the list of child elements has no <stage>s left, all the remaining elements are emitted in the result tree inside a new <sp>. Here's the code, including debugging <xsl:message>s:

<xsl:template name="sp-with-stage">
  <!-- call with speaker -->
  <xsl:param name="speaker" />
  <!-- call with an <sp> element -->
  <xsl:param name="sp" />
  <!-- $content parameter is optional, by default it's the children of the given $sp; this is the parameter whose value is different with each recursive call -->
  <xsl:param name="content" select="$sp/*" />
  <!-- find the first <stage> element amongst the $content node set -->
  <xsl:variable name="stage" select="$content/following-sibling::stage[1]" />

  <xsl:message>ID = <xsl:value-of select="$sp/@xml:id" /></xsl:message>
  <xsl:message>speaker = "<xsl:value-of select="$speaker" />"</xsl:message>
  <xsl:message>content length = <xsl:value-of select="count($content)" /></xsl:message>
  <xsl:if test="$stage">
  <xsl:message>nodes before $stage = <xsl:value-of select="count($stage/preceding-sibling::*)" /></xsl:message>
  <xsl:message>nodes after $stage = <xsl:value-of select="count($stage/following-sibling::*)" /></xsl:message>
  </xsl:if>

  <xsl:if test="$stage">
    <sp by="#{$speaker}">
      <!-- process all the nodes in the $content node set before the current <stage> -->
      <xsl:message>Processing <xsl:value-of select="count($stage/preceding-sibling::*)" /> nodes before "<xsl:value-of select="$stage/text()" />"</xsl:message>
      <xsl:apply-templates select="$stage/preceding-sibling::*" />
    </sp>
    <xsl:apply-templates select="$stage" />
  </xsl:if>
  <xsl:choose>
    <xsl:when test="$stage/following-sibling::stage">
      <!-- if there's another <stage> element in the $content node set then call this template recursively -->
      <xsl:message>Call recursively with <xsl:value-of select="count($stage/following-sibling::*)" /> following nodes</xsl:message>
      <xsl:call-template name="sp-with-stage">
        <xsl:with-param name="speaker"><xsl:value-of select="$speaker" /></xsl:with-param>
        <xsl:with-param name="sp" select="$sp" />
        <!-- the $content node set for this call is all the nodes after the current <stage> -->
        <xsl:with-param name="content" select="$stage/following-sibling::*" />
      </xsl:call-template>
    </xsl:when>
    <xsl:when test="$stage/following-sibling::*">
      <!-- if there's no <stage> element in the $content node set, but there are still some elements, emit them in an <sp> element -->
      <sp by="#{$speaker}">
        <xsl:message>Processing <xsl:value-of select="count($stage/following-sibling::*)" /> trailing nodes</xsl:message>
        <xsl:apply-templates select="$stage/following-sibling::*" />
      </sp>
    </xsl:when>
  </xsl:choose>
</xsl:template>

This template is then called like this:

<xsl:template match="sp[stage]">
  <xsl:call-template name="sp-with-stage">
    <xsl:param name="speaker"><xsl:value-of select="speaker" /></xsl:param>
    <xsl:param name="sp" select="." />
  </xsl:call-template>
</xsl:template>

The problem is with my use of $stage/preceding-sibling::* by which I mean to process just the nodes from the current $content node set that precede the current $stage node. What actually happens is that, in every recursive call, all of the nodes which preceded the current $stage node from its original <sp> context are selected by this $stage/preceding-sibling::*. This is despite the fact that the recursive calls get the correct new $content node set each time and that the $stage node is being taken from that correct $content node set.

To clarify, in the case of the above example XML, when the <stage>To Antony</stage> is the $stage node and the $content node contains just:

<l>consectetur adipisicing elit</l>
<stage>To Antony</stage>
<l>sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.</l>

the $stage/preceding-sibling::* expression still yields all the children of the original <sp> up to <stage>To Antony</stage>.

I guess there must be something about preceding-sibling that I'm not properly understanding. Any suggestions? Or even any suggestions of completely different ways to achieve the transformation?

Was it helpful?

Solution 2

This is a grouping problem - you want to group together all the elements inside each sp (except speaker and stage) by their closest preceding stage (if there is one). The standard approach to this in XSLT 1.0 is called Muenchian grouping. You define a key giving the grouping criteria and then use a generate-id trick to process the first node in each group as a proxy for the group as a whole.

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
  <xsl:strip-space elements="*"/>
  <xsl:output indent="yes" />

  <!-- group first by the parent sp and then by the nearest preceding stage.
       generate-id(emptynodeset) is the empty string by definition, so this
       is still well defined for the elements before the first stage in an sp -->
  <xsl:key name="groupKey" match="sp/*[not(self::speaker | self::stage)]" use="
     concat(generate-id(..), '|', generate-id(preceding-sibling::stage[1]))" />

  <!-- identity template - copy everything as-is unless overridden -->
  <xsl:template match="@*|node()">
    <xsl:copy><xsl:apply-templates select="@*|node()" /></xsl:copy>
  </xsl:template>

  <xsl:template match="sp">
    <!-- for each group -->
    <xsl:for-each select="*[generate-id() = generate-id(key('groupKey',
          concat(generate-id(..), '|', generate-id(preceding-sibling::stage[1]))
        )[1])]">
      <!-- the "stage" if there is one - if we are before the first stage in this
           sp then the preceding-sibling:: will select nothing -->
      <xsl:apply-templates select="preceding-sibling::stage[1]" />
      <sp by="#{../speaker}">
        <!-- the following elements up to the next stage -->
        <xsl:apply-templates select="key('groupKey',
          concat(generate-id(..), '|', generate-id(preceding-sibling::stage[1]))
        )" />
      </sp>
    </xsl:for-each>
  </xsl:template>

</xsl:stylesheet>

This works on your example input but may need some alterations if there are any instances where you have two consecutive stage elements with nothing else in between them.

OTHER TIPS

I suspect you are making this much more complicated than it needs to be. Have a look at the following stylesheet:

XSLT 1.0

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>

<xsl:template match="@*|node()">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
</xsl:template>

<xsl:template match="sp">
    <xsl:copy>
        <xsl:copy-of select="speaker"/>
        <xsl:copy-of select="l[1]"/>
    </xsl:copy>
    <xsl:apply-templates select="stage | l[position() > 1]"/>
</xsl:template>

<xsl:template match="l">
    <sp>
        <xsl:copy-of select="."/>
    </sp>
</xsl:template>

</xsl:stylesheet>

When applied to the following example input:

<root>
    <sp id="sp1">
      <speaker>Julius</speaker>
      <l>Lorem ipsum dolor sit amet</l>
      <stage>Aside</stage>
      <l>consectetur adipisicing elit</l>
      <stage>To Antony</stage>
      <l>sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.</l>
    </sp>
    <sp id="sp2">
      <speaker>Antony</speaker>
      <l>Nullam at dui.</l>
      <stage>Front</stage>
      <l>Nunc lobortis. </l>
    </sp>
</root>

the result is:

<?xml version="1.0" encoding="UTF-8"?>
<root>
   <sp>
      <speaker>Julius</speaker>
      <l>Lorem ipsum dolor sit amet</l>
   </sp>
   <stage>Aside</stage>
   <sp>
      <l>consectetur adipisicing elit</l>
   </sp>
   <stage>To Antony</stage>
   <sp>
      <l>sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.</l>
   </sp>
   <sp>
      <speaker>Antony</speaker>
      <l>Nullam at dui.</l>
   </sp>
   <stage>Front</stage>
   <sp>
      <l>Nunc lobortis. </l>
   </sp>
</root>
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top