Question

Fellas!

I have one nasty page to parse but can't figure out how to extract correct data blocks from it using Simple HTML DOM, because it has no CSS child selector support.

HTML:

<ul class="ul-block">
   <li>xxx</li>
   <li>xxx</li>
   <li>
      <ul>
         <li>xxx2</li>
      </ul>
</ul>

How would I extract (direct) child li elements of parent ul.ul-block?

The $node->find('ul[class=ul-block] > li'); doesn't work and $node->find('ul[class=ul-block] li'); ofc finds also nested descandant li elements :(

Was it helpful?

Solution

Simple example with php DOM:

$dom = new DomDocument;
$dom->loadHtml('
<ul class="ul-block">
   <li>a</li>
   <li>b</li>
   <li>
      <ul>
         <li>c</li>
      </ul>
   </li>
</ul>
');

$xpath = new DomXpath($dom);
foreach ($xpath->query('//ul[@class="ul-block"]/li') as $liNode) {
    echo $liNode->nodeValue, '<br />';
}

OTHER TIPS

I had the same issue, and used the children method to grab just the first level items.

<ul class="my-list">
    <li>
        <a href="#">Some Text</a>
        <ul>
            <li><a href="#">Some Inner Text</a></li>
            <li><a href="#">Some Inner Text</a></li>
            <li><a href="#">Some Inner Text</a></li>
            <li><a href="#">Some Inner Text</a></li>
        </ul>
    </li>
    <li>
        <a href="#">Some Text</a>
        <ul>
            <li><a href="#">Some Inner Text</a></li>
            <li><a href="#">Some Inner Text</a></li>
            <li><a href="#">Some Inner Text</a></li>
            <li><a href="#">Some Inner Text</a></li>
        </ul>
    </li>
</ul>

And here's the Simple HTML Dom code to get just the first level li items:

$html = file_get_html( $url );
$first_level_items = $html->find( '.my-list', 0)->children();

foreach ( $first_level_items as $item ) {
    ... do stuff ...
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top