Question

I am transforming HTML into TEI and ran into the problem of handling footnotes.

The input HTML looks like:

 <content>
        <div>
            <p>p1</p>
            <p>p2</p>
            <p>p3<a href="#_ftn1" name="_ftnref1" title="">[1]</a> p3</p>
            <p>p4</p>
            <p>p5<a href="#_ftn2" name="_ftnref2" title="">[2]</a> p5</p>
            <p>p6</p>

            <p><a href="#_ftnref1" name="_ftn1" title="">[1]</a> footnote1</p>

            <p><a href="#_ftnref2" name="_ftn2" title="">[2]</a> footnote2</p>

        </div>
    </content>

The desired output is:

<content>
    <div>
        <p>p1</p>
        <p>p2</p>
        <p>p3<note>footnote1</note> p3</p>
        <p>p4</p>
        <p>p5<note>footnote2</note> p5</p>
        <p>p6</p>
    </div>
</content>

Unfortunately I have no idea how to handle this. All the other elements are simpley exchanged e.g. by doing this:

<xsl:template match="xhtml:br">
    <lb/>
</xsl:template>

Thanks a lot for your help!

Was it helpful?

Solution

The following transform will give the desired output.

Note that it makes a few assumptions about how the content is structured. In particular, how do you know when a p is a footnote? It is structurally the same as other paragraphs. The code below uses the identifier naming scheme, which may or may not be consistent throughout your real input.

The same applies when omitting the footnote back-reference itself when copying the footnote content. The following code uses a simple approach of copying the anchor's sibling text, which may also be too simplistic for your real data.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                                xmlns:xhtml="http://www.w3.org/1999/xhtml"
                                exclude-result-prefixes="xhtml">

    <xsl:key name="fn" match="xhtml:a" use="@name" />

    <!-- Copy template with namespace stripped -->
    <xsl:template match="*">
        <xsl:element name="{name()}">
            <xsl:apply-templates select="node()|@*" />
        </xsl:element>
    </xsl:template>

    <!-- Omit footnote content instead of reference -->
    <xsl:template match="xhtml:a[key('fn', substring-after(@href, '#'))]">
        <note>
            <xsl:copy-of select="key('fn', substring-after(@href, '#'))/../text()"/>
        </note>
    </xsl:template>

    <!-- Hack to omit the footnotes themselves -->
    <xsl:template match="xhtml:*[xhtml:a[contains(@href, '_ftnref')]]" />

</xsl:stylesheet>

OTHER TIPS

Try this template to get your result:

<xsl:template match="a[contains(@href,'ftn')]">
   <note><xsl:value-of select="substring(text(),2,1)"/></note>
  </xsl:template>
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top