Pregunta

I have a XML file in which everything is well structured except for ordered lists. Every list item is tagged as a paragraph <p>, with the enumeration added manually: (1). I want to create a valid HTML list from that source.

Using the xsl:matching-substring method and regular expressions I was able to extract every list item but I can't seem to find a way to add the surrounding <ol> tags.

Here is an example:

XML source:

<Content>
    <P>(1) blah</P>
    <P>(2) blah</P>
    <P>(2) blah</P>
</Content>

What I have so far:

<xsl:variable name="text" select="/Content/*/text()"/>
<xsl:analyze-string select="$text" regex="(\(\d+\))([^(]*)">
    <xsl:matching-substring>    
        <![CDATA[<li>]]><xsl:value-of select="regex-group(2)"/><![CDATA[</li>]]>
    </xsl:matching-substring>
</xsl:analyze-string>

Output:

<li>blah</li>
<li>blah</li>
<li>blah</li>

In case you are wondering: output has to be plain text in general, only the contents of the $text variable have to be output in HTML. That's why I am using <![CDATA[]].

¿Fue útil?

Solución

As simple as this:

I. XSLT 2.0 solution:

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:template match="/*">
  <ol>
    <xsl:apply-templates/>
  </ol>
 </xsl:template>

 <xsl:template match="P[matches(., '(^\(\d+\)\s*)(.*)')]">
    <li>
        <xsl:analyze-string select="." regex="(^\(\d+\)\s*)(.*)">
            <xsl:matching-substring>
              <xsl:value-of select="regex-group(2)"/>
            </xsl:matching-substring>
        </xsl:analyze-string>
    </li>
 </xsl:template>
</xsl:stylesheet>

When this transformation is applied on the provided XML document:

<Content>
    <P>(1) blah</P>
    <P>(2) blah</P>
    <P>(2) blah</P>
</Content>

the wanted, correct result is produced:

<ol>
    <li>blah</li>
    <li>blah</li>
    <li>blah</li>
</ol>

II. XSLT 1.0 solution:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:template match="/*">
  <ol>
    <xsl:apply-templates/>
  </ol>
 </xsl:template>

 <xsl:template match=
  "P[starts-with(.,'(')
   and
     floor(substring-before(substring(.,2), ')'))
    =
     substring-before(substring(.,2), ')')
    ]">
    <li>
         <xsl:value-of select="substring-after(., ') ')"/>
    </li>
 </xsl:template>
</xsl:stylesheet>

when this transformation is applied on the same XML document (above), the same correct result is produced:

<ol>
   <li>blah</li>
   <li>blah</li>
   <li>blah</li>
</ol>

Otros consejos

This is not really a solution, but a suggested slight improvement on Dimitre's solution.

(1) The template match condition for the XSLT 2.0 solution can be simplified to ...

<xsl:template match="P[matches(., '^\(\d+\)')]">

Having said that, the regex for the xsl:analyze-string should remain as it is.

(2) Possibly, this is outside the scope of the question, but the question reads like html is the intended output, and so the html xsl:output method should be suggested to the OP.

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top