Вопрос

I am trying to convert HTML document into plain text document using XSLT. However, I am quite new to XSLT and I can't understand why ouput of my transformation is different from my desired output.

My input HTML document:

<html>
<body>
  <h1>Heading 1</h1>
  <p class="first">First paragraph.</p>
  <p class="para">Regular paragraph 1.</p>
  <p class="para">Regular paragraph 2.</p>
  <p class="para">Regular paragraph 3.</p>
  <p class="last">Last paragraph.</p>
  <h2 class="someclass">Heading 2</h2>
  <p class="first">First paragraph 2.</p>
  <p class="para">Regular paragraph 4.</p>
  <p class="para">Regular paragraph 5.</p>
  <p class="para">Regular paragraph 6.</p>
</body>
</html>

My desired output (plain text):

Heading (h1): Heading 1
Para (first): First paragraph.
Para (regular): Regular paragraph 1.
Para (regular): Regular paragraph 2.
Para (regular): Regular paragraph 3.
Para (last): Last paragraph.
Heading (someclass): Heading 2
Para (first): First paragraph 2.
Para (regular): Regular paragraph 4.
Para (regular): Regular paragraph 5.
Para (regular): Regular paragraph 6.

My XSLT:

<?xml version="1.0" encoding="UTF-8"?>

<xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

    <xsl:template match="/">

        <xsl:for-each select="//p[@class='first']">
            Para (first): <xsl:value-of select="."/>
        </xsl:for-each>

        <xsl:for-each select="//p[@class='para']">
            Para (regular): <xsl:value-of select="."/>
        </xsl:for-each>

        <xsl:for-each select="//p[@class='last']">
            Para (last): <xsl:value-of select="."/>
        </xsl:for-each>

        <xsl:for-each select="//h1">
            Heading (h1): <xsl:value-of select="."/>
        </xsl:for-each>

        <xsl:for-each select="//h2[@class='someclass']">
            Heading (someclass): <xsl:value-of select="."/>
        </xsl:for-each>

    </xsl:template>
</xsl:stylesheet>

Result of applying above XSLT to input HTML document:

Para (first): First paragraph.
Para (first): First paragraph 2.
Para (regular): Regular paragraph 1.
Para (regular): Regular paragraph 2.
Para (regular): Regular paragraph 3.
Para (regular): Regular paragraph 4.
Para (regular): Regular paragraph 5.
Para (regular): Regular paragraph 6.
Para (last): Last paragraph.
Heading (h1): Heading 1
Heading (someclass): Heading 2

What I want to do is to put content of tags from HTML document into plain text in order content appears in HTML document. What this tranformation does instead is to put all elements matching same XPATH after one another.

I suspect that solution is using apply-templates element, however I do not understand how it works and hence have trouble using it for above example.

Это было полезно?

Решение

This transformation is doing exactly what you've told it to - first process all the p[@class='first'] elements, then all the p[@class='para'] etc. Instead you are correct that you should define separate templates for each of the different cases and the use apply-templates to separate the issue of which elements to process from the issue of what to do with each one.

<xsl:template match="/">
  <!-- process all the child elements of body in document order -->
  <xsl:apply-templates select="html/body/*" />
</xsl:template>

<!-- if the element we're processing is a <p class="first"> ... -->
<xsl:template match="p[@class='first']">
    Para (first): <xsl:value-of select="."/>
</xsl:template>

<!-- etc. etc. -->
<xsl:template match="p[@class='para']">
    Para (regular): <xsl:value-of select="."/>
</xsl:template>

<xsl:template match="p[@class='last']">
    Para (last): <xsl:value-of select="."/>
</xsl:template>

<xsl:template match="h1">
    Heading (h1): <xsl:value-of select="."/>
</xsl:template>

<xsl:template match="h2[@class='someclass']">
    Heading (someclass): <xsl:value-of select="."/>
</xsl:template>
Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top