Came up with a solution that works.
I have a base class that all of my delayed jobs inherit from called BaseJob
:
class BaseJob
attr_accessor :live_hash
def before(job)
# check to make sure that the version of code here is the right version of code
resp = HTTParty.get("#{Rails.application.config.root_url}/revision")
self.live_hash = resp.body.strip
end
def should_perform()
return self.live_hash == GIT_REVISION
end
def perform()
if self.should_perform == true
self.safe_perform()
end
end
def safe_perform()
# override this method in subclasses
end
def success(job)
if self.should_perform == false
# log stats here about a failure
# enqueue a new job of the same kind
new_job = DelayedJob.new
new_job.priority = job.priority
new_job.handler = job.handler
new_job.queue = job.queue
new_job.run_at = job.run_at
new_job.save
job.delete
# restart the delayed job system
%x("export RAILS_ENV=#{Rails.env} && ./script/delayed_job stop")
else
# log stats here about a success
end
end
end
All base classes inherit from BaseJob
and override safe_perform
to actually do their work. A few assumptions about the above code:
Rails.application.config.root_url
points to the root of your app (ie: www.myapp.com)- There is a route exposed called
/revision
(ie: www.myapp.com/revision) - There is a global constant called
GIT_REVISION
that your app knows about
What I ended up doing was putting the output of git rev-parse HEAD
in a file and pushing that with the code. That gets loaded in upon startup so it's available in the web version as well as in the delayed_job consumers.
When we deploy code via Capistrano, we no longer stop, start, or restart delayed_job consumers. We install a cronjob on consumer nodes that runs every minute and determines if a delayed_job process is running. If one isn't, then a new one will be spawned.
As a result of all of this, all of the following conditions are met:
- Pushing code doesn't wait on delayed_job to restart/force kill anymore. Existing jobs that are running are left alone when new code is pushed.
- We can detect when a job begins if the consumer is running old code. The job gets requeued and the consumer kills itself.
- When a delayed_job dies, a new one is spawned via a cronjob with new code (by the nature of starting delayed_job, it has new code).
- If you're paranoid about killing delayed_job consumers, install a nagios check that does the same thing as the cron job but alerts you when a delayed_job process hasn't been running for 5 minutes.