Question

I have this kind of HTML document.

<span class="class1">text1</span>
<a href="">link1</a>
<font color=""><b>text2</b></font>
<a href="">link2</a>
text3
<span class="class2">text4</span>

And I'd like to surround text1, text2 and text3 by &nbsp;s. What would be the best way? DomDocument cannot catch strings that are not tagged. For text1 and text2, getElementByTagName('tagname')->item(0) can be used but for text 3, I'm not sure what to do.

Any ideas?

[Edit]

As Musa suggests, I tried using nextSibling.

<?php
$html = <<<STR
    <span class="class1">text1</span>
    <a href="">link1</a>
    <font color=""><b>text2</b></font>
    <a href="">link2</a>
    text3
    <span class="class2">text4</span>
STR;

$doc = new DOMDocument;
$doc->loadHTML($html);
foreach ($doc->getElementsByTagName('a') as $nodeA) {
    $nodeA->nextSibling->nodeValue = '&nbsp;' . $nodeA->nextSibling->nodeValue . '&nbsp;';
}
echo $doc->saveHtml();
?>

However, &nbsp;gets escaped and converted to &amp;nbsp;

Was it helpful?

Solution

Since the setting the value seems to set it as text and not html you could use the non-breaking space character instead of the html entity.

<?php
$html = <<<STR
    <span class="class1">text1</span>
    <a href="">link1</a>
    <font color=""><b>text2</b></font>
    <a href="">link2</a>
    text3
    <span class="class2">text4</span>
STR;
$nbsp = "\xc2\xa0";
$doc = new DOMDocument;
$doc->loadHTML('<div>' . $html . '</div>');

foreach( $doc->getElementsByTagName('div')->item(0)->childNodes as $node ) {
    if ($node->nodeType == 3) {     // nodeType:3 TEXT_NODE
        $node->nodeValue = $nbsp . $node->nodeValue . $nbsp;
    }
}
echo $doc->saveHtml();
?>

OTHER TIPS

You should be able to use getElementsByTagName and then iterate over the node list, adding &nbsp; as necessary.

getElementsByTagName('body')

http://php.net/manual/en/domdocument.getelementsbytagname.php

will return a nodelist

http://www.php.net/manual/en/class.domnodelist.php

which you can then iterate over the individual items

http://www.php.net/manual/en/domnodelist.item.php

the nodeType will let you know what you are dealing with. Text3 is a TEXT_NODE which has a value of 3

https://developer.mozilla.org/en-US/docs/DOM/Node.nodeType?redirectlocale=en-US&redirectslug=nodeType

Hope that gets you going in the right direction.

One solution I came up with:

<?php
$html = <<<STR
    <span class="class1">text1</span>
    <a href="">link1</a>
    <font color=""><b>text2</b></font>
    <a href="">link2</a>
    text3
    <span class="class2">text4</span>
STR;

$doc = new DOMDocument;
$doc->loadHTML('<div>' . $html . '</div>');

foreach( $doc->getElementsByTagName('div')->item(0)->childNodes as $node ) {
    if ($node->nodeType == 3) {     // nodeType:3 TEXT_NODE
        $node->nodeValue = '[identical_replacement_string]' . $node->nodeValue . '[identical_replacement_string]';
    }
}
$output = str_replace("[identical_replacement_string]", "&nbsp;", $doc->saveHtml());
echo $output;
?>

Please feel free to post better solutions.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top