Question

I have site running rails application and resque workers running in production mode, on Ubuntu 9.10, Rails 2.3.4, ruby-ee 2010.01, PostgreSQL 8.4.2

Workers constantly raised errors: PGError: server closed the connection unexpectedly.

My best guess is that master resque process establishes connection to db (e.g. authlogic does that when use User.acts_as_authentic), while loading rails app classes, and that connection becomes corrupted in fork()ed process (on exit?), so next forked children get kind of broken global ActiveRecord::Base.connection

I could reproduce very similar behaviour with this sample code imitating fork/processing in resque worker. (AFAIK, users of libpq recommended to recreate connections in forked process anyway, otherwise it's not safe )

But, the odd thing is that when I use pgbouncer or pgpool-II instead of direct pgsql connection, such errors do not appear.

So, the question is where and how should I dig to find out why it is broken for plain connection and is working with connection pools? Or reasonable workaround?

Was it helpful?

Solution

When I created Nestor, I had the same kind of problem. The solution was to re-establish the connection in the forked process. See the relevant code at http://github.com/francois/nestor/blob/master/lib/nestor/mappers/rails/test/unit.rb#L162

From my very limited look at Resque code, I believe a call to #establish_connection should be done right about here: https://github.com/resque/resque/blob/master/lib/resque/worker.rb#L123

OTHER TIPS

After doing a bit of research / trial and error. For anyone who is coming across the same issue. To clarify what gc mentioned.

Resque.after_fork = Proc.new { ActiveRecord::Base.establish_connection }

Above code should be placed in: /lib/tasks/resque.rake

For example:

require 'resque/tasks'

task "resque:setup" => :environment do
  ENV['QUEUE'] = '*'

  Resque.after_fork do |job|
    ActiveRecord::Base.establish_connection
  end

end

desc "Alias for resque:work (To run workers on Heroku)"
task "jobs:work" => "resque:work"

Hope this helps someone, as much as it did for me.

You cannot pass a libpq reference across a fork() (or to a new thread), unless your application takes very close care of not using it in conflicting ways. (Like, a mutex around every single attempt to use it, and you must never close it). This is the same for both direct connections and using pgbouncer. If it worked in pgbouncer, that was pure luck in missing a race condition for some reason, and will eventually break.

If your program uses forking, you must create the connection after the fork.

Change Apache configuration and add

PassengerSpawnMethod conservative

I had this issue with all of my Mailer classes and I needed to call ActiveRecord::Base.verify_active_connections! within the mailer methods in order to ensure a connection was made.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top