I am trying to convert HTML document into plain text document using XSLT. However, I am quite new to XSLT and I can't understand why ouput of my transformation is different from my desired output.
My input HTML document:
<html>
<body>
<h1>Heading 1</h1>
<p class="first">First paragraph.</p>
<p class="para">Regular paragraph 1.</p>
<p class="para">Regular paragraph 2.</p>
<p class="para">Regular paragraph 3.</p>
<p class="last">Last paragraph.</p>
<h2 class="someclass">Heading 2</h2>
<p class="first">First paragraph 2.</p>
<p class="para">Regular paragraph 4.</p>
<p class="para">Regular paragraph 5.</p>
<p class="para">Regular paragraph 6.</p>
</body>
</html>
My desired output (plain text):
Heading (h1): Heading 1
Para (first): First paragraph.
Para (regular): Regular paragraph 1.
Para (regular): Regular paragraph 2.
Para (regular): Regular paragraph 3.
Para (last): Last paragraph.
Heading (someclass): Heading 2
Para (first): First paragraph 2.
Para (regular): Regular paragraph 4.
Para (regular): Regular paragraph 5.
Para (regular): Regular paragraph 6.
My XSLT:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<xsl:for-each select="//p[@class='first']">
Para (first): <xsl:value-of select="."/>
</xsl:for-each>
<xsl:for-each select="//p[@class='para']">
Para (regular): <xsl:value-of select="."/>
</xsl:for-each>
<xsl:for-each select="//p[@class='last']">
Para (last): <xsl:value-of select="."/>
</xsl:for-each>
<xsl:for-each select="//h1">
Heading (h1): <xsl:value-of select="."/>
</xsl:for-each>
<xsl:for-each select="//h2[@class='someclass']">
Heading (someclass): <xsl:value-of select="."/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
Result of applying above XSLT to input HTML document:
Para (first): First paragraph.
Para (first): First paragraph 2.
Para (regular): Regular paragraph 1.
Para (regular): Regular paragraph 2.
Para (regular): Regular paragraph 3.
Para (regular): Regular paragraph 4.
Para (regular): Regular paragraph 5.
Para (regular): Regular paragraph 6.
Para (last): Last paragraph.
Heading (h1): Heading 1
Heading (someclass): Heading 2
What I want to do is to put content of tags from HTML document into plain text in order content appears in HTML document. What this tranformation does instead is to put all elements matching same XPATH after one another.
I suspect that solution is using apply-templates element, however I do not understand how it works and hence have trouble using it for above example.