Pergunta

I'm wondering what's the best way to gracefully restart delayed_job consumers after a new code push? I'm pushing code with capistrano and I know there are commands to restart, but if there are jobs currently running, the command either hangs (and my deploy takes forever) or it forcefully quits the currently running job and I lose data.

Ideally I'd like my deploy to happen like this:

  1. Existing delayed_job consumer is running with version 1 code
  2. I run cap deploy and version 2 code is pushed out to new servers
  3. During the deploy, we touch a file to tell the delayed_job to restart when it's done processing the current job. This can be done a bunch of different ways, but I was thinking it would be similar to how passenger is gracefully restarted
  4. Existing delayed_job consumer continues to finish the current job with version 1 code
  5. Current job finishes, delayed_job consumer sees that it needs to restart itself before continuing to process jobs
  6. delayed_job consumer automatically restarts, now running version 2 code
  7. delayed_job consumer continues to process jobs, now running on version 2 code

I've tried to insert some code to restart before a job runs by checking the current revision of the code but every time I do that, it just dies and doesn't actually restart anything. Sample code below:

def before(job)
  # check to make sure that the version of code here is the right version of code
  live_git_hash = LIVE_REVISION
  local_git_hash = LOCAL_REVISION

  if live_git_hash != local_git_hash
    # get environment to reload in
    environment = Rails.env # production, development, staging

    # restart the delayed job system
    %x("export RAILS_ENV=#{environment} && ./script/delayed_job restart")
  end
end

It detects it just fine but it dies on the shell call. Any ideas?

Thanks!

Foi útil?

Solução

Came up with a solution that works.

I have a base class that all of my delayed jobs inherit from called BaseJob:

class BaseJob
  attr_accessor :live_hash

  def before(job)
    # check to make sure that the version of code here is the right version of code
    resp = HTTParty.get("#{Rails.application.config.root_url}/revision")
    self.live_hash = resp.body.strip
  end

  def should_perform()
    return self.live_hash == GIT_REVISION
  end

  def perform()
    if self.should_perform == true
      self.safe_perform()
    end
  end

  def safe_perform()
    # override this method in subclasses
  end

  def success(job)
    if self.should_perform == false
      # log stats here about a failure

      # enqueue a new job of the same kind
      new_job = DelayedJob.new
      new_job.priority = job.priority
      new_job.handler = job.handler
      new_job.queue = job.queue
      new_job.run_at = job.run_at
      new_job.save
      job.delete

      # restart the delayed job system
      %x("export RAILS_ENV=#{Rails.env} && ./script/delayed_job stop")
    else
      # log stats here about a success
    end
  end

end

All base classes inherit from BaseJob and override safe_perform to actually do their work. A few assumptions about the above code:

  • Rails.application.config.root_url points to the root of your app (ie: www.myapp.com)
  • There is a route exposed called /revision (ie: www.myapp.com/revision)
  • There is a global constant called GIT_REVISION that your app knows about

What I ended up doing was putting the output of git rev-parse HEAD in a file and pushing that with the code. That gets loaded in upon startup so it's available in the web version as well as in the delayed_job consumers.

When we deploy code via Capistrano, we no longer stop, start, or restart delayed_job consumers. We install a cronjob on consumer nodes that runs every minute and determines if a delayed_job process is running. If one isn't, then a new one will be spawned.

As a result of all of this, all of the following conditions are met:

  • Pushing code doesn't wait on delayed_job to restart/force kill anymore. Existing jobs that are running are left alone when new code is pushed.
  • We can detect when a job begins if the consumer is running old code. The job gets requeued and the consumer kills itself.
  • When a delayed_job dies, a new one is spawned via a cronjob with new code (by the nature of starting delayed_job, it has new code).
  • If you're paranoid about killing delayed_job consumers, install a nagios check that does the same thing as the cron job but alerts you when a delayed_job process hasn't been running for 5 minutes.
Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top