Question

I am using simple html dom to find links on a certain page using:

// Find all links
foreach($html->find('a') as $element)
       echo $element->href . '<br>'; 

This find all the links on the page, however i want to be able to go to found links as well and find links inside those found links recursively for example to level 5.

Any idea of how to go about?

Was it helpful?

Solution

Use a recursive function and keep track of the depth:

function findLinks($url, $depth, $maxDepth) {
  // fetch $url and parse it
  // ...
  if ($depth <= $maxDepth)
    foreach($html->find('a') as $element)
      findLinks($element->href, $depth + 1, $maxDepth);
}

And you would start by calling something like findLinks($rootUrl, 1, 5).

OTHER TIPS

In the past I did need a similar feature. What you can do is use mysql to store your links.

In my case I had a todo table and a pages table. Seed your todo table with some url's you want to spider.

What I used to do was to get the page info I need (plaintext and title) and store this in a mysql db pages. Then I used to loop through the links and add them to the todo table. The last step was to remove the current page from my todo list then loop over..

grab a url from todo loop 
{ 
   get current page title and plaintext store it in pages table
   loop through links Add found links to todo table
   remove current page from todo 
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top