Question

Hey. I use delayed_job for background processing. I have 8 CPU server, MySQL and I start 7 delayed_job processes

RAILS_ENV=production script/delayed_job -n 7 start 

Q1: I'm wondering is it possible that 2 or more delayed_job processes start processing the same process (the same record-row in the database delayed_jobs). I checked the code of the delayed_job plugin but can not find the lock directive in a way it should be (no lock table or SELECT...FOR UPDATE).

I think each process should lock the database table before executing an UPDATE on lock_by column. They lock the record simply by updating the locked_by field (UPDATE delayed_jobs SET locked_by...). Is that really enough? No locking needed? Why? I know that UPDATE has higher priority than SELECT but I think this does not have the effect in this case.

My understanding of the multy-threaded situation is:

Process1: Get waiting job X. [OK]
Process2: Get waiting jobs X. [OK]
Process1: Update locked_by field. [OK]
Process2: Update locked_by field. [OK]
Process1: Get waiting job X. [Already processed]
Process2: Get waiting jobs X. [Already processed]

I think in some cases more jobs can get the same information and can start processing the same process.

Q2: Is 7 delayed_jobs a good number for 8CPU server? Why yes/not.

Thx 10x!

Was it helpful?

Solution

I think the answer to your question is in line 168 of 'lib/delayed_job/job.rb':

self.class.update_all(["locked_at = ?, locked_by = ?", now, worker], ["id = ? and (locked_at is null or locked_at < ?)", id, (now - max_run_time.to_i)])

Here the update of the row is only performed, if no other worker has already locked the job and this is checked if the table is updated. A table lock or similar (which by the way would massively reduce the performance of your app) is not needed, since your DBMS ensures that the execution of a single query is isolated from effects off other queries. In your example Process2 can't get the lock for job X, since it updates the jobs table if and only if it was not locked before.

To your second question: It depends. On an 8 CPU server. which is dedicated for this job, 8 workers are a good starting point, since workers are single threaded you should run one for every core. Depending on your setup more or less workers are better. It heavily depends on your jobs. Take your jobs advantage of mutiple cores? Or does your job wait most of the time for external resources? You have experiment with different settings and have a look at all involved resources.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top