Question

I'm building a spider which will traverse various sites and data mining them.

Since I need to get each page separately this could take a VERY long time (maybe 100 pages). I've already set the set_time_limit to be 2 minutes per page but it seems like apache will kill the script after 5 minutes no matter.

This isn't usually a problem since this will run from cron or something similar which does not have this time limit. However I would also like the admins to be able to start a fetch manually via a HTTP-interface.

It is not important that apache is kept alive for the full duration, I'm, going to use AJAX to trigger a fetch and check back once in a while with AJAX.

My problem is how to start the fetch from within a PHP-script without the fetch being terminated when the script calling it dies.

Maybe I could use system('script.php &') but I'm not sure it will do the trick. Any other ideas?

Was it helpful?

Solution

    $cmd = "php myscript.php $params > /dev/null 2>/dev/null &";

    # when we call this particular command, the rest of the script 
    # will keep executing, not waiting for a response
    shell_exec($cmd);

What this does is sends all the STDOUT and STDERR to /dev/null, and your script keeps executing. Even if the 'parent' script finishes before myscript.php, myscript.php will finish executing.

OTHER TIPS

if you don't want to use exec you can use a php built in function !

ignore_user_abort(true);

this will tell the script to resume even if the connection between the browser and the server is dropped ;)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top