Question

I need to search string which can be something like this:

<div class="icon_star">&nbsp;</div>

or

<div class="icon_star"></div>

or

<div class="icon_star"> </div>

I need to search above strings in HTML which could be something like this:

<h1 class="redword" tag="h1">
   <span class="BASE">good</span>
</h1>
<span class="headword-definition">&#160;-&#160;definition</span>
</span>
<div class="icon_star"></div>
<!-- End of DIV icon_star-->

<div class="icon_star"></div>
<!-- End of DIV icon_star-->

<div class="icon_star"></div>
<!-- End of DIV icon_star-->

</div><!-- End of DIV -->

<div class="headbar">
   <div id="helplinks-box" class="responsive_hide_on_smartphone">  

String which we are trying to search and store in array can be multiple times

I have tried using the following regex:

preg_match_all ('/<div(\s)+class="icon_star">(.*?)<\/div>/i', $html1, $result_array1);

This above regex does not work when HTML to be searched is

<div id="headword">
    <div id="headwordright">
        <div style="display: none;" id="showmore"><a class="button" onmousedown="foldingSet(false)"><span class="label">Show more</span></a>
        </div><!-- End of DIV -->
        <div id="showless"><a class="button" onmousedown="foldingSet(true)"><span class="label">Show less</span></a>
        </div><!-- End of DIV -->
    </div><!-- End of DIV -->
    <span class="BASE-FORM">
        <h1 tag="h1" class="redword"><span class="BASE">scenario</span></h1>
        <span class="headword-definition">&nbsp;-&nbsp;definition</span>
    </span>
    <div class="icon_star">&nbsp;</div><!-- End of DIV icon_star-->
</div>
Was it helpful?

Solution

Update

It seems that you are reading your regexp results wrong way. Executing

preg_match_all('/<div(\s)+class="icon_star">.*?<\/div>/i', $html, $result_array1);

for($x = 0; $x < count($result_array1); $x++)
    $result_array1[$x] = array_map('htmlentities', $result_array1[$x]);

echo '<pre>' . print_r($result_array1, 1);

prints out

   Array
   (
       [0] => Array
       (
           [0] => <div class="icon_star">&nbsp;</div>
       )

       [1] => Array
       (
           [0] =>  
       )

   )   

so you should be checking count of $result_array1[0] instead of $result_array1

side note

instead of parsing HTML with regex, you could use DOMDocument class built into PHP, if you can.
Using following code extracts three div's.

Be aware that you need to have valid HTML for this method to work.

  //your HTML with tag added to make it valid
  $html = '<div>
     <h1 class="redword" tag="h1">
        <span class="BASE">good</span>
     </h1>
     <span class="headword-definition"><span>&#160;-&#160;definition</span></span>
     <div class="icon_star"></div>
     <div class="icon_star"></div>
     <div class="icon_star"></div>
  </div>
  <div class="headbar">
     <div id="helplinks-box" class="responsive_hide_on_smartphone">
     </div>
  </div>';

  $dom = new DOMDocument();
  @$dom->loadHTML($html);
  $x = new DOMXPath($dom);

  //this xpath query looks for all nodes that have "class" attribute value equal to "icon_star"
  $nodes = $x->query("//*[contains(@class, 'icon_star')]");

  $res = '';
  foreach($nodes as $node) {
     /**
      * @var $node DOMElement
      */
     $res .= $dom->saveHTML($node);
  }

  echo htmlentities($res);

You could read following useful questions on stackoverflow
How do you parse and process HTML/XML in PHP?
Getting DOM elements by classname

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top