Using PHP, how do I remove HTML Text After/Before Certain Number of

https://stackoverflow.com/questions/12102951

28-06-2021
|

Question

Using PHP, how can I remove HTML text that is placed before/after a certain number of   tags?

For example, I have this,

<div>
    <div><img sec=""></div>
    <br>
    <h3>title</h3>
    <span>some text here</span>
    <br>
    Some text that I want to remove.
    <br>
    <br>
</div>

I'd like to remove the string before the last two   tags. Or It could be said after the second  .

I tried explode() with   and omitted the last two array elements with array_push(). However, I had to add </div> to close the outer tag. When the outer tag dynamically changes, it's not a good idea.

Does anybody have a solution for this?

Solution 3

Okey, this is what I've achieved. Although this might not be the most efficient way but I'll share. I used DOMinnerHTML() introduced here and preg_split(). This removes the text after the last three   tags.

<?php 
$html = <<<STR
<div>
    <div><img sec=""></div>
    <br>
    <h3>title</h3>
    <span>some text here</span>
    <br>
    Some text that I want to remove.
    <br>
    <br>
</div>
STR;

$doc = new DOMDocument;
$doc->loadHTML($html);
$node = $doc->getElementsByTagName('div')->item(0);
$innerHtml = DOMinnerHTML($node);
$arrHtml = preg_split('/<br.*?\/?>/i', $innerHtml);     // devide the string into arrays by <br> or <br />
array_splice($arrHtml, -3);     // remove the last three elements   
$edited = implode(" ", $arrHtml);

echo $edited;

function DOMinnerHTML($element) 
{ 
    $innerHTML = ""; 
    $children = $element->childNodes; 
    foreach ($children as $child) 
    { 
        $tmp_dom = new DOMDocument(); 
        $tmp_dom->appendChild($tmp_dom->importNode($child, true)); 
        $innerHTML.=trim($tmp_dom->saveHTML()); 
    } 
    return $innerHTML; 
} 
?>

OTHER TIPS

In addition to Joshua's answer, if you want to do it in easier way you can use simple html dom library which can be found in the link below. Just go through their documentation. This library comes handy many times when you encounter the problems like you have now and when you want to scrape the web contents.

http://simplehtmldom.sourceforge.net/

What you'll want to be doing is string matching, using regular expressions, to get the text before the two   tags and after the previous   tag. See the following:

http://www.regular-expressions.info/php.html

I did the following:

function limitTag($str,$tag,$limit) {
  $array = explode($tag,$str);
  $newStr = '';
  $i=0;
  foreach ($array as $child){
    if ($i<=$limite){
      if ($i>0) $newStr .= $tag;
      $newStr .= $child;
      $i++;
    } else break;
  }
  return $newStr;
}

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow

Using PHP, how do I remove HTML Text After/Before Certain Number of <br>