Question

perhaps my approach is not correct, all advice appreciated i am trying to scrape all a tags from a web page and order then as follows h1, h2, h3, h4, h5, h6, a (all the rest) my code so far,

$layout['h1']=$html->find('h1 a');
$layout['h2']=$html->find('h2 a');
$layout['h3']=$html->find('h3 a');
$layout['h4']=$html->find('h4 a');
$layout['h5']=$html->find('h5 a');
$layout['h6']=$html->find('h6 a');
$layout['a']=$html->find('a');

//then i can print the arrays as follows
foreach ($layout as $key=>$value){
    foreach ($layout[$key] as $text) {
       echo $text.' |  ';
    }
}   

this all works well when outputting however my question is how can i get the tags that do not have a heading tag wrapped around them.

could i for example just get all the a tags with $layout[all_links]= $html->find('a') and then do a loop to remove all tags that are wrapped with heading tag and keep whats left over. I am not sure how to code this, perhaps unset?

thanks in advance, i have tried to think of many ways to do it but at a loss, does anyone have a better suggestion or should i re think the entire function

Was it helpful?

Solution

You can do something like:

foreach($html->find('a') as $a){
  if(!preg_match('/h[1-6]/', $a->parent->tag)) echo $a;
}

That will skip the a's whose parent is not a h1..h6

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top