Question

I want to write a worker for beanstalkd in php, using a Zend Framework 2 controller. It starts via the CLI and will run forever, asking for jobs from beanstalkd like this example.

In simple pseudo-like code:

while (true) {
    $data   = $beanstalk->reserve();

    $class  = $data->class;
    $params = $data->params;

    $job    = new $class($params);
    $job();
}

The $job has here an __invoke() method of course. However, some things in these jobs might be running for a long time. Some might run with a considerable amount of memory. Some might have injected the $beanstalk object, to start new jobs themselves, or have a Zend\Di\Locator instance to pull objects from the DIC.

I am worried about this setup for production environments on the long term, as perhaps circular references might occur and (at this moment) I do not explicitly "do" any garbage collection while this action might run for weeks/months/years *.

*) In beanstalk, reserve is a blocking call and if no job is available, this worker will wait until it gets any response back from beanstalk.

My question: how will php handle this on the long term and should I take any special precaution to keep this from blocking?

This I did consider and might be helpful (but please correct if I am wrong and add more if possible):

  1. Use gc_enable() before starting the loop
  2. Use gc_collect_cycles() in every iteration
  3. Unset $job in every iteration
  4. Explicitly unset references in __destruct() from a $job

(NB: Update from here)

I did run some tests with arbitrary jobs. The jobs I included were: "simple", just set a value; "longarray", create an array of 1,000 values; "producer", let the loop inject $pheanstalk and add three simplejobs to the queue (so there is now a reference from job to beanstalk); "locatoraware", where a Zend\Di\Locator is given and all job types are instantiated (though not invoked). I added 10,000 jobs to the queue, then I reserved all jobs in a queue.

Results for "simplejob" (memory consumption per 1,000 jobs, with memory_get_usage())

0:     56392
1000:  548832
2000:  1074464
3000:  1538656
4000:  2125728
5000:  2598112
6000:  3054112
7000:  3510112
8000:  4228256
9000:  4717024
10000: 5173024

Picking a random job, measuring the same as above. Distribution:

["Producer"] => int(2431)
["LongArray"] => int(2588)
["LocatorAware"] => int(2526)
["Simple"] => int(2456)

Memory:

0:     66164
1000:  810056
2000:  1569452
3000:  2258036
4000:  3083032
5000:  3791256
6000:  4480028
7000:  5163884
8000:  6107812
9000:  6824320
10000: 7518020

The execution code from above is updated to this:

$baseMemory = memory_get_usage();
gc_enable();

for ( $i = 0; $i <= 10000; $i++ ) {
    $data = $bheanstalk->reserve();

    $class = $data->class;
    $params = $data->params;

    $job = new $class($params);
    $job();

    $job = null;
    unset($job);

    if ( $i % 1000 === 0 ) {
        gc_collect_cycles();
        echo sprintf( '%8d: ', $i ), memory_get_usage() - $baseMemory, "<br>";
    }
}

As everybody notices, the memory consumption is in php not leveraged and kept to a minimum, but increases over time.

Was it helpful?

Solution 2

I ended up benchmarking my current code base line for line, after which I came to this:

$job = $this->getLocator()->get($data->name, $params);

It uses the Zend\Di dependency injection which instance manager tracks instances through the complete process. So after a job was invoked and could be removed, the instance manager still kept it in memory. Not using Zend\Di for instantiating the jobs immediately resulted in a static memory consumption instead of a linear one.

OTHER TIPS

I've usually restarted the script regularly - though you don't have to do it after every job is run (unless you want to, and it's useful to clear memory). You could for example run for up to 100 jobs or more at a time or till the script had used say 20MB RAM, and then exit the script, to be instantly re-run.

My blogpost at http://www.phpscaling.com/2009/06/23/doing-the-work-elsewhere-sidebar-running-the-worker/ has some example shell scripts of re-running the scripts.

For memory safety, don't use looping after each sequence job in PHP. But just create simple bash script to do looping:

while [ true ] ; do
    php  do_jobs.php 
done

Hey there, with do_jobs.php contains something like:

// ...

$data   = $beanstalk->reserve();

$class  = $data->class;
$params = $data->params;

$job    = new $class($params);
$job();


// ...

simple right? ;)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top