Pregunta

Here's the problem: I have a database full of articles marked up in XHTML. Our application uses Prince XML to generate PDFs. An artifact of that is that footnotes are marked up inline, using the following pattern:

<p>Some paragraph text<span class="fnt">This is the text of a footnote</span>.</p>

Prince replaces every span.fnt with a numeric footnote marker, and renders the enclosed text as a footnote at the bottom of the page.

We want to render the same content in ebook formats, and XHTML is a great starting point, but the inline footnotes are terrible. What I want to do is convert the footnotes to endnotes in my ebook build script.

This is what I'm thinking:

  1. Create an empty array called $endnotes to store the endnote text.
  2. Set a variable $endnote_no to zero. This variable will hold the current endnote number, to display inline as an endnote marker, and to be used in linking the endnote marker to the particular endnote.
  3. Use preg_replace or preg_replace_callback to find every instance of <span class="fnt">(.*?)</span>.
  4. Increment $endnote_no for each instance, and replace the inline span with '<sup><a href="#endnote_' . $endnote_no . '">' .$endnote_no . ''`
  5. Push the footnote text to the $endnotes array so that I can use it at the end of the document.
  6. After replacing all the footnotes with numeric endnote references, iterate through the $endnotes array to spit out the endnotes as an ordered list in XHTML.

This process is a bit beyond my PHP comprehension, and I get lost when I try to translate this into code. Here's what I have so far, which I mainly cobbled together based on code examples I found in the PHP documentation:

$endnotes = array();
$endnote_no = 0;
class Endnoter {

  public function replace($subject) {
    $this->endnote_no = 0;
    return preg_replace_callback('`<span class="fnt">(.*?)</span>`', array($this, '_callback'), $subject);
  }

  public function _callback($matches) {
    array_push($endnotes, $1);
    return '<sup><a href="#endnote_' . $this->endnote_no++ . '">' . $this->endnote_no . '</a></sup>';
  }
}

...

$replacer = new Endnoter();
$replacer->replace($body);
echo '<pre>';
print_r($endnotes); // Just checking to see if the $endnotes are there.
echo '</pre>';

Any guidance would be helpful, especially if there is a simpler way to get there.

¿Fue útil?

Solución

Don't know about a simpler way, but you were halfway there. This seems to work.

I just cleaned it up a bit, moved the variables inside your class and added an output method to get the footnote list.

class Endnoter
{
    private $number_of_notes = 0;
    private $footnote_texts = array();

    public function replace($input) {

        return preg_replace_callback('#<span class="fnt">(.*)</span>#i', array($this, 'replace_callback'), $input);

    }

    protected function replace_callback($matches) {

        // the text sits in the matches array
        // see http://php.net/manual/en/function.preg-replace-callback.php
        $this->footnote_texts[] = $matches[1];

        return '<sup><a href="#endnote_'.(++$this->number_of_notes).'">'.$this->number_of_notes.'</a></sup>';

    }

    public function getEndnotes() {
        $out = array();
        $out[] = '<ol>';

        foreach($this->footnote_texts as $text) {
            $out[] = '<li>'.$text.'</li>';
        }

        $out[] = '</ol>';

        return implode("\n", $out);
    }

 }

Otros consejos

First, you're best off not using a regex for HTML manipulation; see here: How do you parse and process HTML/XML in PHP?

However, if you really want to go that route, there are a few things wrong with your code:

  1. return '<sup><a href="#endnote_' . $this->endnote_no++ . '">' . $this->endnote_no . '</a></sup>';
    

    if endnote_no is 1, for example this will produce

    '<sup><a href="#endnote_1">2</a></sup>';
    

    If those values are both supposed to be the same, you want to increment endnote_no first:

    return '<sup><a href="#endnote_' . ++$this->endnote_no . '">' . $this->endnote_no . '</a></sup>';
    

    Note the ++ in front of the call instead of after.

  2. array_push($endnotes, $1);
    

    $1 is not a defined value. You're looking for the array you passed in to the callback, so you want $matches[1]

  3. print_r($endnotes);
    

    $endnotes is not defined outside the class, so you either want a getter function to retrieve $endnotes (usually preferable) or make the variable public in the class. With a getter:

    class Endnotes {
        private $endnotes = array();
        //replace any references to $endnotes in your class with $this->endnotes and add a function:
    
        public function getEndnotes() {
            return $this->endnotes;
        }
    }
    //and then outside
    print_r($replacer->getEndnotes());
    
  4. preg_replace_callback doesn't pass by reference, so you aren't actually modifying the original string. $replacer->replace($body); should be $body = $replacer->replace($body); unless you want to pass body by reference into the replace() function and update its value there.

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top