Question

I'm having an interesting issue whereby I'm trying to daisy chain two transformations in order (using XSLT 2.0), to remove CDATA elements from the original XML in the first pass so that it can be parsed as XML in the second. Whilst I doubt it will effect the result, I'm using the initial template parameter in Saxon 9 HE and the collection() function to gather multiple XML documents (same namespaces) into a variable before putting it through the transformation. It's worth noting I can't even get this to work with one document at the moment.

My input document(s):

<root>
    <blah>
        <![CDATA[<elementA att="A"><elementB att="B">Text</elementB></elementA>]]>
    </blah>
</root>

My XSLT attempt:

    <!-- Collect all XML files in $input folder for processing -->
    <xsl:variable name="xml" select="collection(concat($input,'?select=*.*ml;recurse=no;on-error=ignore'))"/>

    <!-- Initial template is called from the saxon command line using the -it:process option -->
    <xsl:template name="process">
        <!-- First pass -->
        <xsl:variable name="pass1xml">
            <xsl:apply-templates select="$xml" mode="pass1"/>
        </xsl:variable>
        <!-- First pass output -->
        <xsl:result-document href="{concat($output,'\pass1.xml')}" method="xml" indent="yes">
            <xsl:copy-of select="$pass1xml"/>
        </xsl:result-document>
        <!-- Second pass -->
        <xsl:apply-templates select="$pass1xml" mode="pass2"/>
    </xsl:template>

    <!-- First pass: copy everything -->
    <xsl:template match="@* | node()" mode="pass1">
        <xsl:copy>
            <xsl:apply-templates select="@* | node()" mode="pass1"/>
        </xsl:copy>
    </xsl:template>

    <!-- First pass: strip CDATA from element -->
    <xsl:template match="blah" mode="pass1">
        <xsl:copy>
            <xsl:value-of select="." disable-output-escaping="yes"/>
        </xsl:copy>
    </xsl:template>

    <!-- Second pass using $pass1xml variable -->
    <xsl:template match="root" mode="pass2">
        <!-- Second pass output -->
        <xsl:result-document href="{concat($output,'\pass2.xml')}" method="xml" indent="yes">
            <xsl:apply-templates select="descendant::elementA" mode="elementA"/>
        </xsl:result-document>      
    </xsl:template>

...etc (continue with second pass)...

Desired output from first pass:

<root>
    <blah>
        <elementA att="A">
            <elementB att="B">Text</elementB>
        </elementA>
    </blah>
</root>

What I'm seeing at the moment in my pass1.xml (result of first pass - in spite of using disable-output-escaping="yes"), is escaped XML, which obviously isn't XPATHable in pass2:

<root>
    <blah>
        &lt;elementA att="A"&gt;&lt;elementB att="B"&gt;Text&lt;/elementB&gt;&lt;/elementA&gt;
    </blah>
</root>

Unfortunately I can't change my source document to remove the CDATA (I appreciate this would solve my issue). The XML within the CDATA will always be well-formed too, so I have no qualms about stripping it out. Perhaps I'm misunderstanding the daisy chain approach which means what I'm trying to achieve isn't possible - in any case I'm keen to learn.

Many thanks for your time and advice - it's highly appreciated!

Was it helpful?

Solution

disable-output-escaping is a serialiation feature so it does not help for in-memory nodes you want to pass on to a second transformation step in the same stylesheet, you would need to use two stylesheets where the result of the first is serialized first before being fed to the second.

As you mention Saxon, I would however consider using the commercial versions and the extension functions or XSLT/XPath 3.0 functions like [parse-xml][1] or [parse-xml-fragment][2] offered to simply parse and process the contents of e.g.

<xsl:template match="blah">
  <xsl:apply-templates select="parse-xml-fragment(.)/node()"/>
</xsl:template> 

As an alternative, in Saxon 9.1 B there is an extension function available even in the open source version.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top