Question

I am writing an application that gives users a tinymce HTML editor. The problem that I am facing is that despite how often I ask my users to use "Heading 2" (h2) styles to format their headers, they are either using h1 (which I can deal with!) or they are using a new paragraph, and then bolding the paragraph for the content.

ie

<p><strong>This is a header</strong></p>
<p>Content content blah blah blah.</p>

What I would like to do is find all of the instances of <p><strong> that have say less then eight words in them and replace them with a h2.

What is the best way to do this?

UPDATE: Thanks to Jack's code, I have worked on a simple module that does everything that I described here and more. The code is here on GitHub.

Était-ce utile?

La solution

You can use DOMDocument for this. Find the <strong> tag that's a child of <p>, count the number of words and replace node and parent with a <h2>:

$content = <<<'EOM'
<p><strong>This is a header</strong></p>
<p>Content content blah blah blah.</p>
EOM;

$doc = new DOMDocument;
$doc->loadHTML($content);
$xp = new DOMXPath($doc);


foreach ($xp->query('//p/strong') as $node) {
        $parent = $node->parentNode;
        if ($parent->textContent == $node->textContent && 
                str_word_count($node->textContent) <= 8) {
            $header = $doc->createElement('h2', $node->textContent);
            $parent->parentNode->replaceChild($header, $parent);
        }
}

echo $doc->saveHTML();

Autres conseils

Since you seem to be proficient in PHP, you may find the PHP Simple HTML Dom Parser very intuitive for this task. Here's a snippet from the documentation showcasing a very simple way to change the tag name after locating the elements you're requesting:

$html = str_get_html("<div>foo <b>bar</b></div>");
$e = $html->find("div", 0);

echo $e->tag; // Returns: " div"
echo $e->outertext; // Returns: " <div>foo <b>bar</b></div>"
echo $e->innertext; // Returns: " foo <b>bar</b>"
echo $e->plaintext; // Returns: " foo bar"

Attribute Name  Usage
$e->tag     Read or write the tag name of element.
$e->outertext   Read or write the outer HTML text of element.
$e->innertext   Read or write the inner HTML text of element.
$e->plaintext   Read or write the plain text of element.

This is the code that I have worked on.

<?php

$content_old = <<<'EOM'
<p>&nbsp; </p>
<p>lol<strong>test</strong></p>
<p><strong>This is a header</strong></p>
<p>Content content blah blah blah.</p>
EOM;

$content = preg_replace("/<p[^>]*>[\s|&nbsp;]*<\/p>/", '', $content_old);

$doc = new DOMDocument;
$doc->loadHTML($content);
$xp = new DOMXPath($doc);

foreach ($xp->query('//p/strong') as $node) {
    $parent = $node->parentNode;
    if ($parent->textContent == $node->textContent && 
            str_word_count($node->textContent) <= 8) {
        $header = $doc->createElement('h2');
        $parent->parentNode->replaceChild($header, $parent);
        $header->appendChild($doc->createTextNode( $node->textContent ));
    }
}

// just using saveXML() is not good enough, because it adds random html tags
$xp = new DOMXPath($doc);
$everything = $xp->query("body/*"); // retrieves all elements inside body tag
$output = '';
if ($everything->length > 0) { // check if it retrieved anything in there
    foreach ($everything as $thing) {
        $output .= $doc->saveXML($thing) . "\n";
    }
};

echo "--- ORIGINAL --\n\n";
echo $content_old;
echo "\n\n--- UPDATED ---\n\n";
echo $output;

When I run the script, this is the output that I get:

--- ORIGINAL --

<p>&nbsp; </p>
<p>lol<strong>test</strong></p>
<p><strong>This is a header</strong></p>
<p>Content content blah blah blah.</p>

--- UPDATED ---

<p>lol<strong>test</strong></p>
<h2>This is a header</h2>
<p>Content content blah blah blah.</p>

Update #1

It's worth nothing that if there are other tags inside the <p><strong> tag (for example, <p><strong><a>) then the entire <p> will be replaced, which was not my intention.

This is easily fixed by changing the if to this:

        if ($parent->textContent == $node->textContent &&
                str_word_count($node->textContent) <= 8 &&
                $node->childNodes->item(0)->nodeType == XML_TEXT_NODE) {

Update #2

It's also worth noting that the original createElement would cause issues if the content inside the <p><strong> contained HTML characters that should be escaped (for example &).

The old code was:

        $header = $doc->createElement('h2', $node->textContent);
        $parent->parentNode->replaceChild($header, $parent);

The new code (which works correctly) is:

        $header = $doc->createElement('h2');
        $parent->parentNode->replaceChild($header, $parent);
        $header->appendChild($doc->createTextNode( $node->textContent ));
Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top