문제

I am trying to convert HTML document into plain text document using XSLT. However, I am quite new to XSLT and I can't understand why ouput of my transformation is different from my desired output.

My input HTML document:

<html>
<body>
  <h1>Heading 1</h1>
  <p class="first">First paragraph.</p>
  <p class="para">Regular paragraph 1.</p>
  <p class="para">Regular paragraph 2.</p>
  <p class="para">Regular paragraph 3.</p>
  <p class="last">Last paragraph.</p>
  <h2 class="someclass">Heading 2</h2>
  <p class="first">First paragraph 2.</p>
  <p class="para">Regular paragraph 4.</p>
  <p class="para">Regular paragraph 5.</p>
  <p class="para">Regular paragraph 6.</p>
</body>
</html>

My desired output (plain text):

Heading (h1): Heading 1
Para (first): First paragraph.
Para (regular): Regular paragraph 1.
Para (regular): Regular paragraph 2.
Para (regular): Regular paragraph 3.
Para (last): Last paragraph.
Heading (someclass): Heading 2
Para (first): First paragraph 2.
Para (regular): Regular paragraph 4.
Para (regular): Regular paragraph 5.
Para (regular): Regular paragraph 6.

My XSLT:

<?xml version="1.0" encoding="UTF-8"?>

<xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

    <xsl:template match="/">

        <xsl:for-each select="//p[@class='first']">
            Para (first): <xsl:value-of select="."/>
        </xsl:for-each>

        <xsl:for-each select="//p[@class='para']">
            Para (regular): <xsl:value-of select="."/>
        </xsl:for-each>

        <xsl:for-each select="//p[@class='last']">
            Para (last): <xsl:value-of select="."/>
        </xsl:for-each>

        <xsl:for-each select="//h1">
            Heading (h1): <xsl:value-of select="."/>
        </xsl:for-each>

        <xsl:for-each select="//h2[@class='someclass']">
            Heading (someclass): <xsl:value-of select="."/>
        </xsl:for-each>

    </xsl:template>
</xsl:stylesheet>

Result of applying above XSLT to input HTML document:

Para (first): First paragraph.
Para (first): First paragraph 2.
Para (regular): Regular paragraph 1.
Para (regular): Regular paragraph 2.
Para (regular): Regular paragraph 3.
Para (regular): Regular paragraph 4.
Para (regular): Regular paragraph 5.
Para (regular): Regular paragraph 6.
Para (last): Last paragraph.
Heading (h1): Heading 1
Heading (someclass): Heading 2

What I want to do is to put content of tags from HTML document into plain text in order content appears in HTML document. What this tranformation does instead is to put all elements matching same XPATH after one another.

I suspect that solution is using apply-templates element, however I do not understand how it works and hence have trouble using it for above example.

도움이 되었습니까?

해결책

This transformation is doing exactly what you've told it to - first process all the p[@class='first'] elements, then all the p[@class='para'] etc. Instead you are correct that you should define separate templates for each of the different cases and the use apply-templates to separate the issue of which elements to process from the issue of what to do with each one.

<xsl:template match="/">
  <!-- process all the child elements of body in document order -->
  <xsl:apply-templates select="html/body/*" />
</xsl:template>

<!-- if the element we're processing is a <p class="first"> ... -->
<xsl:template match="p[@class='first']">
    Para (first): <xsl:value-of select="."/>
</xsl:template>

<!-- etc. etc. -->
<xsl:template match="p[@class='para']">
    Para (regular): <xsl:value-of select="."/>
</xsl:template>

<xsl:template match="p[@class='last']">
    Para (last): <xsl:value-of select="."/>
</xsl:template>

<xsl:template match="h1">
    Heading (h1): <xsl:value-of select="."/>
</xsl:template>

<xsl:template match="h2[@class='someclass']">
    Heading (someclass): <xsl:value-of select="."/>
</xsl:template>
라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top