Question

I have this:

<root>
<row>
    <field>&amp;lt;![CDATA[&amp;lt;comprobante xmlns:xsi="http://www.w3.org/2001/XMLSchema"&amp;gt;
        &amp;lt;inicioCFD&amp;gt;
        &amp;lt;idArchivo&amp;gt;182NAI053402&amp;lt;/idArchivo&amp;gt;
        &amp;lt;etiquetaCFD&amp;gt;NCR&amp;lt;/etiquetaCFD&amp;gt;
        &amp;lt;/inicioCFD&amp;gt;
        &amp;lt;/comprobante&amp;gt;]]&amp;gt;</field>
</row>
</root>

I need this:

<comprobante>
  <idArchivo etiquetaCFD="NCR">182NAI053402</idArchivo>
</comprobante>

I'm using this xslt:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:exsl="http://exslt.org/common"
xmlns:a="http://www.tralix.com/cfd/2"
xmlns:xsi="http://www.w3.org/2001/XMLSchema"
xmlns:xalan="http://xml.apache.org/xalan"
extension-element-prefixes="exsl xalan">
<xsl:strip-space elements="*"/>
<xsl:output indent="yes"/>
<xsl:template match="/root/row/field">
    <xsl:variable name="comprobante_">
        <xsl:variable name="p6">
            <xsl:variable name="p5">
                <xsl:variable name="p4">
                    <xsl:variable name="p3">
                        <xsl:variable name="p2">
                            <xsl:variable name="p1">
                                <xsl:value-of select="substring-before(substring-after(.,'CDATA['),']]')"/>
                            </xsl:variable>
                            <xsl:call-template name="replace-string">
                                <xsl:with-param name="text" select="$p1"/>
                                <xsl:with-param name="replace" select="'gt;'" />
                                <xsl:with-param name="with" select="'¬'"/>
                            </xsl:call-template>
                        </xsl:variable>
                        <xsl:call-template name="replace-string">
                            <xsl:with-param name="text" select="$p2"/>
                            <xsl:with-param name="replace" select="'lt;'"/>
                            <xsl:with-param name="with" select="'~'"/>
                        </xsl:call-template>
                    </xsl:variable>
                    <xsl:call-template name="replace-string">
                        <xsl:with-param name="text" select="$p3"/>
                        <xsl:with-param name="replace" select="'&amp;~'"/>
                        <xsl:with-param name="with" select="'&lt;'"/>
                    </xsl:call-template>
                </xsl:variable>
                <xsl:call-template name="replace-string">
                    <xsl:with-param name="text" select="$p4"/>
                    <xsl:with-param name="replace" select="'&amp;¬'"/>
                    <xsl:with-param name="with" select="'&gt;'"/>
                </xsl:call-template>
            </xsl:variable>
            <xsl:value-of select="$p5" disable-output-escaping="yes"/>
        </xsl:variable>
        <xsl:copy-of select="xalan:nodeset($p6)"/>
    </xsl:variable>
    <xsl:variable name="comprobante" select="xalan:nodeset($comprobante_)"/>
    <comprobante>
      <idArchivo>
          <xsl:attribute name="etiquetaCFD">
              <xsl:value-of select="$comprobante/comprobante/inicioCFD/etiquetaCFD"/>
          </xsl:attribute>
              <xsl:value-of select="$comprobante/comprobante/inicioCFD/idArchivo"/>
      </idArchivo>  
    </comprobante>
       </xsl:template>
<xsl:template name="replace-string">
    <xsl:param name="text"/>
    <xsl:param name="replace"/>
    <xsl:param name="with"/>
    <xsl:choose>
        <xsl:when test="contains($text,$replace)">
            <xsl:value-of select="substring-before($text,$replace)"/>
            <xsl:value-of select="$with"/>
            <xsl:call-template name="replace-string">
                <xsl:with-param name="text"
                    select="substring-after($text,$replace)"/>
                <xsl:with-param name="replace" select="$replace"/>
                <xsl:with-param name="with" select="$with"/>
            </xsl:call-template>
        </xsl:when>
        <xsl:otherwise>
            <xsl:value-of select="$text"/>
        </xsl:otherwise>
    </xsl:choose>
</xsl:template>
</xsl:stylesheet>

It produces this:

<comprobante>
  <idArchivo etiquetaCFD=""></idArchivo>
</comprobante>

Empty values are caused because escaped XML is not an XML like post XSLT: How to transform partially escaped XML? says, so i couldn't read anything from my $comprobante variable.

But in that post, Dimitri says it can be used saxon:parse(). Well, I'm using Xalan processor, but i couldn't find something similar. I'm limited to use xalan and xslt 1.0.

Any help?

Thanks in advance

Was it helpful?

Solution 2

@IanRoberts, @MichaelKay, with your help, i figured out how to create an escaped xml parser, and this working xslt is the result. Thanks for your help!

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"
xmlns:exsl="http://exslt.org/common" 
extension-element-prefixes="exsl"
xmlns:xsi="http://www.w3.org/2001/XMLSchema">
<xsl:output indent="yes"/>
<xsl:template match="/">
    <xsl:variable name="withoutCDataStart"
        select="substring-after(root/row/field, '&amp;lt;![CDATA[')"/>
    <xsl:variable name="withoutCDataEnd"
        select="substring-before($withoutCDataStart, ']]&amp;gt;')"/>
    <xsl:variable name="unEscapedXml">
        <xsl:call-template name="unescape">
            <xsl:with-param name="text" select="$withoutCDataEnd"/>
        </xsl:call-template>
    </xsl:variable>
    <xsl:variable name="parsedXml_">
        <xsl:call-template name="parseXml">
            <xsl:with-param name="text" select="$unEscapedXml"/>
        </xsl:call-template>
    </xsl:variable>
    <xsl:variable name="parsedXml" select="exsl:node-set($parsedXml_)"/>
    <comprobante>
        <idArchivo>
            <xsl:attribute name="etiquetaCFD">
                <xsl:value-of select="$parsedXml/comprobante/inicioCFD/etiquetaCFD"/>
            </xsl:attribute>
            <xsl:value-of select="$parsedXml/comprobante/inicioCFD/idArchivo"/>
        </idArchivo>  
    </comprobante>
</xsl:template>
<xsl:template name="unescape">
    <xsl:param name="text"/>
    <xsl:choose>
        <xsl:when test="contains($text, '&amp;')">
            <xsl:value-of select="substring-before($text, '&amp;')"/>
            <xsl:variable name="afterAmp" select="substring-after($text, '&amp;')"/>
            <xsl:choose>
                <xsl:when test="starts-with($afterAmp, 'amp;')">&amp;</xsl:when>
                <xsl:when test="starts-with($afterAmp, 'lt;')">&lt;</xsl:when>
                <xsl:when test="starts-with($afterAmp, 'gt;')">&gt;</xsl:when>
                <xsl:when test="starts-with($afterAmp, 'quot;')">"</xsl:when>
                <xsl:when test="starts-with($afterAmp, 'apos;')">'</xsl:when>
            </xsl:choose>
            <xsl:call-template name="unescape">
                <xsl:with-param name="text" select="substring-after($afterAmp, ';')"/>
            </xsl:call-template>
        </xsl:when>
        <xsl:otherwise>
            <xsl:value-of select="$text"/>
        </xsl:otherwise>
    </xsl:choose>
</xsl:template>
<xsl:template name="parseXml">
    <xsl:param name="text"/>
    <xsl:choose>
        <xsl:when test="contains($text, '&gt;')">
            <xsl:variable name="topLevelTag">
                <xsl:call-template name="getTopLevelTag">
                    <xsl:with-param name="text" select="$text"/>
                </xsl:call-template>
            </xsl:variable>
            <xsl:variable name="openingTag">
                <xsl:value-of select="$topLevelTag"/>
            </xsl:variable>
            <xsl:variable name="tagName">
                <xsl:call-template name="getTopLevelTagName">
                    <xsl:with-param name="text" select="$text"/>
                </xsl:call-template>
            </xsl:variable>
            <xsl:variable name="closingTag">
                <xsl:value-of select="concat('&lt;/',$tagName,'&gt;')"/>
            </xsl:variable>
            <xsl:variable name="firstNode">
                <xsl:if test="not(contains($topLevelTag,'/&gt;'))">
                    <xsl:value-of select="substring-before(substring-after($text,$openingTag),$closingTag)"/>        
                </xsl:if>
            </xsl:variable>
            <xsl:variable name="afterFirstNode">
                <xsl:choose>
                    <xsl:when test="not(contains($topLevelTag,'/&gt;'))">
                        <xsl:value-of select="substring-after($text,concat($firstNode,$closingTag))"/>        
                    </xsl:when>
                    <xsl:otherwise>
                        <xsl:value-of select="substring-after($text,$topLevelTag)"/>
                    </xsl:otherwise>
                </xsl:choose>
            </xsl:variable>
            <xsl:element name="{$tagName}">
                <xsl:call-template name="createAttributes">
                    <xsl:with-param name="text" select="$topLevelTag"/>
                </xsl:call-template>
                <xsl:call-template name="parseXml">
                    <xsl:with-param name="text" select="$firstNode"/>
                </xsl:call-template>
            </xsl:element>
            <xsl:call-template name="parseXml">
                <xsl:with-param name="text" select="$afterFirstNode"/>
            </xsl:call-template>
        </xsl:when>
        <xsl:otherwise>
            <xsl:value-of select="$text"/>
        </xsl:otherwise>
    </xsl:choose>
</xsl:template>
<xsl:template name="getTopLevelTagName">
    <xsl:param name="text"/>
    <xsl:choose>
        <xsl:when test="contains($text, '&gt;')">
            <xsl:variable name="tagWithAttributesWithoutEnd">
                <xsl:value-of select="substring-before($text, '&gt;')"/>
            </xsl:variable>
            <xsl:variable name="tagWithAttributesWithoutBegining">
                <xsl:value-of select="substring-after($tagWithAttributesWithoutEnd, '&lt;')"/>
            </xsl:variable>
            <xsl:variable name="tagName">
                <xsl:choose>
                    <xsl:when test="contains($tagWithAttributesWithoutBegining,' ')">
                        <xsl:value-of
                            select="substring-before($tagWithAttributesWithoutBegining, ' ')"/>
                    </xsl:when>
                    <xsl:otherwise>
                        <xsl:value-of select="$tagWithAttributesWithoutBegining"/>
                    </xsl:otherwise>
                </xsl:choose>
            </xsl:variable>
            <xsl:value-of select="$tagName"/>
        </xsl:when>
    </xsl:choose>
</xsl:template>
<xsl:template name="getTopLevelTag">
    <xsl:param name="text"/>
    <xsl:choose>
        <xsl:when test="contains($text, '&gt;')">
            <xsl:variable name="tagWithAttributesWithoutEnd">
                <xsl:value-of select="substring-before($text, '&gt;')"/>
            </xsl:variable>
            <xsl:value-of select="concat($tagWithAttributesWithoutEnd,'&gt;')"/>
        </xsl:when>
    </xsl:choose>
</xsl:template>
<xsl:template name="createAttributes">
    <xsl:param name="text"/>
    <xsl:choose>
        <xsl:when test="contains($text, '=&quot;')">
            <xsl:variable name="attributeName">
                <xsl:value-of select="substring-before(substring-after($text,' '),'=&quot;')"/>
            </xsl:variable>
            <xsl:message>
                <xsl:value-of select="$text"/>
            </xsl:message>
            <xsl:variable name="attributeValue">
                <xsl:value-of select="substring-before(substring-after($text,concat($attributeName,'=&quot;')),'&quot;')"/>
            </xsl:variable>
            <xsl:attribute name="{$attributeName}">
                <xsl:value-of select="$attributeValue"/>
            </xsl:attribute>
            <xsl:call-template name="createAttributes">
                <xsl:with-param name="text" select="substring-after($text,concat($attributeName,'=&quot;',$attributeValue,'&quot;'))"/>
            </xsl:call-template>
        </xsl:when>
    </xsl:choose>        
</xsl:template>
</xsl:stylesheet>

It produces my required output:

<comprobante xmlns:xsi="http://www.w3.org/2001/XMLSchema">
    <idArchivo etiquetaCFD="NCR">182NAI053402</idArchivo>
</comprobante>

I publish my work hoping could be helpfull to any other person.

OTHER TIPS

Something like this would extract the escaped content from inside the field and output it as plain text (that happens to be well-formed XML):

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
  <xsl:output method="text" />

  <xsl:template match="/">
    <xsl:variable name="withoutCDataStart"
       select="substring(root/row/field, 13)" />
    <xsl:variable name="withoutCDataEnd"
       select="substring($withoutCDataStart, 1,
                         string-length($withoutCDataStart) - 6)" />

    <xsl:call-template name="unescape">
      <xsl:with-param name="text" select="$withoutCDataEnd" />
    </xsl:call-template>
  </xsl:template>

  <xsl:template name="unescape">
    <xsl:param name="text" />
    <xsl:choose>
      <xsl:when test="contains($text, '&amp;')">
        <xsl:value-of select="substring-before($text, '&amp;')" />
        <xsl:variable name="afterAmp" select="substring-after($text, '&amp;')" />
        <xsl:choose>
          <xsl:when test="starts-with($afterAmp, 'amp;')">&amp;</xsl:when>
          <xsl:when test="starts-with($afterAmp, 'lt;')">&lt;</xsl:when>
          <xsl:when test="starts-with($afterAmp, 'gt;')">&gt;</xsl:when>
          <xsl:when test="starts-with($afterAmp, 'quot;')">"</xsl:when>
          <xsl:when test="starts-with($afterAmp, 'apos;')">'</xsl:when>
        </xsl:choose>
        <xsl:call-template name="unescape">
          <xsl:with-param name="text" select="substring-after($afterAmp, ';')" />
        </xsl:call-template>
      </xsl:when>
      <xsl:otherwise>
        <xsl:value-of select="$text" />
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>
</xsl:stylesheet>

You would then have to feed the output of this back into another stylesheet to do the actual transformation you want.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top