Question

I have a mysql queue that manages tasks for several php workers that run every minute via cron job. I'll simplify everything to make it more understandable.

For the mysql part I have 2 tables:

worker_info

worker_id  |  name    | hash      | last_used
1          |  worker1 | d8f9zdf8z | 2014-03-03 13:00:01
2          |  worker2 | odfi9dfu8 | 2014-03-03 13:01:01
3          |  worker3 | sdz7std74 | 2014-03-03 13:02:03
4          |  worker4 | duf8s763z | 2014-03-03 13:02:01
...

tasks

task_id  | times_run | task_id | workers_used
1        | 3         | 2932    | 1,6,3
2        | 2         | 3232    | 6,8
3        | 6         | 5321    | 3,2,6,10,5,20
4        | 1         | 8321    | 3
...

Tasks is a table to keep track of the tasks:

task_id identifies each task, times_run is the number of times a task has been successfully executed. task_id is a number the php script needs for its routines. workers_used is a text field that holds the ids of all worker_infos that have been processed for this task. I don't want the same worker_info multiple times per task, only one time.

worker_info is a table that holds some infos the php script needs to do its job along with last_used which is a global indicator for when this worker was last used.

Several php scripts work on the same tasks and I need the values to be precise as each worker_info should be used only 1 time for each task.

The PHP cron jobs include all the same routines:

the script performs a mysql query to get a task.

1. SELECT * FROM tasks ORDER BY times_run ASC LIMIT 1 We are always working with 1 job at a time

The script locks the worker_info table to avoid that one worker_info gets selected multiple times from a tasks query

2. LOCK TABLES worker_info WRITE

Then it gets a list of all worker_infos not used for this task, sorted by last_used

3. SELECT * FROM worker_info WHERE worker_id NOT IN($workers_used) ORDER BY last_used ASC LIMIT 1

Then it updates the last_used parameter so the same worker_info won't get selected in the meantime when the task still runs

4. UPDATE workder_info Set last_used = NOW() WHERE worker_id = $id

Finally the lock gets released

5. UNLOCK TABLES

The php script performs its routines and if the task was successful it gets updated

6. UPDATE tasks SET times_run = times_run + 1, workers_used = IF(workers_used = '', '$worker_id', CONCAT(workers_used,', $worker_id')) I know it's very bad practice to perform the workers_used this way not using a second table to declare the dependencies but I'm a bit scared of the space it would take. One Task can have several thousand workers_used and I have several thousand tasks themselves. This way the table would quickly become bigger than 1 million entries and I fear that this could slow down things a lot so I went with this way of storage.

Then the script performs step 2-6 10 times for each task before going back to step 1 selecting a new task and doing everything again.

Now this setup has served me well for about one year but now that I need to have 50+ php scripts active on this queue system, I get more and more problems in terms of performance. PHP queries take up to 20 seconds and I cannot scale everymore like I need, if I just run more PHP scripts, the mysql server crashes. I want no data loss if the system crashes, therefore I'm writing every change into the db as it happens. Also when I created the system I had problems with the workers_used because when 10 php scripts work on 1 task it occured very often that one worker_info data was used multiple times in the same task which I do not want.

Therefore I introduced the LOCK which fixed this but I suspect it to be the bottleneck of the system. If one worker locks the table to perform its actions, all other 49 php workers need to wait for that which is bad.

Now my questions are:

Is this implementation even good? Should I stick to it or throw it over and do something else?

Is this LOCK even my problem or does something else might slow down the system?

How can I improve this setup to make it a lot faster?

//Edit As suggested by jeremycole:

I suppose I need to update the worker_info table in order to implement the changes:

worker_info

worker_id  |  name    | hash       | tasks_owner | last_used
1          |  worker1 | d8f9zdf8z  | 1           | 2014-03-03 13:00:01
2          |  worker2 | odfi9dfu8  | NULL        | 2014-03-03 13:01:01
3          |  worker3 | sdz7std74  | NULL        | 2014-03-03 13:02:03
4          |  worker4 | duf8s763z  | NULL        | 2014-03-03 13:02:01
...

And then change the routine to:

SET autocommit=0 Set autocommit to 0 so the queries won't get autocommitted

1. SELECT * FROM tasks ORDER BY times_run ASC LIMIT 1 Select a Task to process

2. START TRANSACTION

3. SELECT * FROM worker_info WHERE worker_id NOT IN($workers_used) AND tasks_owner IS NULL ORDER BY last_used ASC LIMIT 1 FOR UPDATE

4. UPDATE worker_info SET last_used = NOW(), tasks_owner = $task_id WHERE worker_id = $worker_id

5. COMMIT

Do PHP routine and if successful:

6. UPDATE tasks SET times_run = times_run + 1, workers_used = IF(workers_used = '', '$worker_id', CONCAT(workers_used,', $worker_id'))

That should be it or am I wrong at some point? Is the tasks_owner really needed or would it be sufficient to change the last_used date?

No correct solution

OTHER TIPS

It may be useful to read my answer to another question about how to implement a job queue in MySQL here:

MySQL deadlocking issue with InnoDB

In short, using LOCK TABLES for this is quite unnecessary and unlikely to yield good results.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top