Transform XML with XSLT and preserve CDATA (in Ruby)

https://stackoverflow.com/questions/1502083

19-09-2019
|

Question

I am trying to convert a document with content like the following into another document, leaving the CDATA exactly as it was in the first document, but I haven't figured out how to preserve the CDATA with XSLT.

Initial XML:

<node>
    <subNode>
        <![CDATA[ HI THERE ]]>
    </subNode>
    <subNode>
        <![CDATA[ SOME TEXT ]]>
    </subNode>
</node>

Final XML:

<newDoc>
    <data>
        <text>
            <![CDATA[ HI THERE ]]>
        </text>
        <text>
            <![CDATA[ SOME TEXT ]]>
        </text>
    </data>
</newDoc>

I've tried something like this, but no luck, everything gets jumbled:

<xsl:element name="subNode">
    <xsl:value-of select="." disable-output-escaping="yes"/>
</xsl:element>

Any ideas how to preserve the CDATA?

Thanks! Lance

Using ruby/nokogiri

Update: Here's something that works.

<text disable-output-escaping="yes">&lt;![CDATA[</text>
<value-of select="normalize-space(text())" disable-output-escaping="yes"/>
<text disable-output-escaping="yes">]]&gt;</text>

That will wrap all text() nodes in CDATA, which works for what I need, and it will preserve html tags inside the text.

Solution

You cannot preserve the precise sequence of CDATA nodes if they're mixed with plain text nodes. At best, you can force all content of a particular element in the output to be CDATA, by listing that element name in xsl:output/@cdata-section-elements:

<xsl:output cdata-section-elements="text"/>

OTHER TIPS

Sorry to post an answer to my own question, but I found something that works:


<text disable-output-escaping="yes">&lt;![CDATA[</text>
<value-of select="normalize-space(text())" disable-output-escaping="yes"/>
<text disable-output-escaping="yes">]]&gt;</text>

That will wrap all text() nodes in CDATA, which works for what I need, and it will preserve html tags inside the text.

I found this article while trying to solve a similar problem (using an XSL transform to take one XML file and create a partial/subset copy of some of the nodes in it, as a second XML file). In my case the first XML files have some elements whose values are entirely wrapped in CDATA blocks, because they happen to be JSON and they carry some HTML formatting markup.

What I found was that rather than using xsl:value-of, I could use xsl:copy-of, and just as @Pavel Minaev points out, I could keep the original CDATA intact by listing every relevant element name in the xsl:output declaration. This might be an approach that would work for the OP.

XML to be copied (sample):

<text_item>
  <id>100</id>
  <stem_text><![CDATA[(any string of text, including HTML)]]></stem_text>
  <answerOptions><![CDATA[{"choices":[{"label":"Atmospheric O<sub>2</sub>",
   "value":"A"},{"label":"Released CO<sub>2</sub>",
   "value":"B"}]}]]></answerOptions>
 ...
</text_item>

Relevant stylesheet lines:

<xsl:output method="xml" indent="yes" cdata-section-elements="stem_text answerOptions" />
...
<xsl:apply-templates select="//text_item" >
...
<xsl:template match="text_item">
    <xsl:element name="text_item" >
        <xsl:copy-of select="node()"  />
    </xsl:element>
</xsl:template>

The cdata-section-elements attribute means that in the output, the original CDATA blocks in the XML copied from will be passed through, as-is, to the output XML file when the transform runs. It appears that you can name as many elements as you want.

In the OP's example, I believe he would select on //node/subNode and then build an element named text, inside newDoc/data of course. His cdata-section-elements attribute would be simply ="text", exactly as Pavel has it.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow