Question

The scenario:

TL;DR - I need a queue system for triggering jobs based on a future timestamp and NOT on the order it is inserted

I have a MySQL database of entries that detail particular events that need to be performed (which will consist mostly of a series of arithmetic calculations and a database insert/update) in a precise sequence based on timestamps. The time the entry is inserted and when the event will be "performed" has no correlation and is determined by outside factors. The table also contains a second column of milliseconds which increases the timing precision.

This table is part of a job "queue" which will contain entries set to execute from anywhere between a few seconds to a few days in the future, and can potentially have up to thousands of entries added every second. The queue needs to be parsed constantly (every second?) - perhaps by doing a select of all timestamps that have expired during this second and sorting by the milliseconds, and then executing each event detailed by the entries.

The problem

Currently the backend is completely written in PHP on an apache server with MySQL (ie standard LAMP architecture). Right now, the only way I can think of to achieve what I've specified is to write a custom PHP job queue script that will do the parsing and execution, looped every second using this method. There are no other job systems that I'm aware of which can queue jobs according to a specified timestamp/millisecond rather than the entry time.

This method however sounds rather infeasible CPU wise even on paper - I have to perform a huge MySQL query every second and execute some sort of function for each row retrieved, with the possibility of it running over a second of execution time which will start introducing delays to the parsing time and messing up the looping script.

I am of course attempting to create a solution that will be scalable should there be heavy traffic on the system, which this solution fails miserably as it will continue falling behind as the number of entries get larger.

The questions

I'd prefer to stick to the standard LAMP architecture, but is there any other technology I can integrate nicely into the stack that is better equipped to deal with what I'm attempting to do here?

Is there another method entirely to to accurately trigger events at a specified future date without the messy fiddling about with the constant queue checking?

If neither of the above options are suitable, is there a better way to loop the PHP script in the background? In the worst case scenario I can accept the long execution times and split the task up between multiple 'workers'.

Update

RabbitMQ was a good suggestion, but unfortunately doesn't execute the task as soon as it 'expires' - it has to go through a queue first and wait up on any tasks in front that have yet to expire. The expiry time has a wide range between a few seconds to a few days, and the queue needs to be sorted somehow each time a new event is added in so the expiry time is always in order in the queue. This isn't possible as far as I'm aware of in RabbitMQ, and doesn't sound very efficient either. Is there an alternative or a programmatic fix?

Was it helpful?

Solution

Sometimes, making a square peg fit into a round hole takes too much effort. While using MySQL to create queues can be effective, it gets much trickier to scale. I would suggest that this might be an opportunity for RabbitMQ.

Basically, you would setup a message queue that you can put the events into. You would then have a "fanout" architecture with your workers processing each queue. Each worker would listen to the queue and check to see if the particular event needs to be processed. I imagine that a combination of "Work Queues" and the "Routing" techniques available in Rabbit would achieve what you are looking for in a scalable and reliable way.

I would envision a system that works something like this:

  1. spawn workers to listen to queues, using routing keys to prune down how many messages they get
  2. each worker checks the messages to see if they are to be performed now
  3. if the message is to be performed, perform it and acknowledge -- otherwise, re-dispatch the message for future processing. There are some simple techniques available for this.

As you need more scale, you add more workers. RabbitMQ is extremely robust and easy to cluster as well when you eventually cap out your queue server. There are also other cloud-based queuing systems such as Iron.IO and StormMQ

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top