Question

I have about 1000-10000 jobs which I need to run on a constant basis each minute or so. Sometimes new job comes in or other needs to be cancelled but it's rare event. Jobs are tagged and must be disturbed among workers each of them processes only jobs of specific kind.

For now I want to use cron and load whole database of jobs in some broker -- RabbitMQ or beanstalkd (haven't decided which one to use though).

But this approach seems ugly to me (using timer to simulate infinity, loading the whole database, etc) and has the disadvantage: for example if some kind of jobs are processed slower than added into the queue it may be overwhelmed and message broker will eat all ram, swap and then just halt.

Is there any other possibilities? Am I not using right patterns for a job? (May be I don't need queue or something..?)

p.s. I'm using python if this is important.

Was it helpful?

Solution

You create your initial batch of jobs and add them to the queue. You have n-consumers of the queue each running the jobs. Adding consumers to the queue simply round-robins the distribution of jobs to each listening consumer, giving you arbitrary horizontal scalability.

Each job can, upon completion, be responsible for resubmitting itself back to the queue. This means that your job queue won't grow beyond the length that it was when you initialised it. The master job can, if need be, spawn sub-jobs and add them to the queue.

For different types of jobs it is probably a good idea to use different queues. That way you can balance the load more effectively by having different quantities/horsepower of workers running the jobs from the different queues.

The fact that you are running Python isn't important here, it's the pattern, not the language that you need to nail first.

OTHER TIPS

You can use asynchronous framework, e.g. Twisted

I don't think either it's a good idea to run script by cron daemon each minute (and you mentioned reasons), so I offer you Twisted. It doesn't give you benefit with scheduling, but you get flexibility in process management and memory sharing

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top