Why is pgsql sometimes not listening for the first few seconds after start even though "service postgres status" returns OK?

StackOverflow https://stackoverflow.com/questions/20500105

Question

I have a web app that uses postgresql 9.0 with some plperl functions that call custom libraries of mine. So, when I want to start fresh as if just released, my build process for my development area does basically this:

  1. dumps data and roles from production
  2. drops dev data and roles
  3. restores production data and roles onto dev
  4. restarts postgresql so that any cached versions of my custom libraries are flushed and newly-changed ones will be picked up
  5. applies my dev delta
  6. vacuums

Since switching my app's stack from win32 to CentOS, I now sometimes (i.e., it seems, only if and only if I haven't run this build process in "a while"--perhaps at least a day) get an error when my build script tries to apply the delta:

psql: could not connect to server: No such file or directory
Is the server running locally and accepting connections on Unix domain socket "/tmp/.s.PGSQL.5432"?

Specifically, what's failing to execute at the shell level is this:

psql --host=$host -U $superuser -p $port -d $db -f "$delta_filename.sql"

If, immediately after seeing this error, I try to connect to the dev database with psql, I can do so with no trouble. Also, if I just re-run the build script, it works fine the second time, every time I've encountered this. Acceptable workaround, but is the underlying cause something to be concerned about?

So far in my attempts to debug this, I inserted a step just after the server restart (which of course reports OK shutdown, OK startup) whereby I check the results of service postgresql-dev status in a loop, waiting 2 seconds between tries if it fails. On my latest build script run, said loop succeeds on the first try--status returns "is running"--but then applying the delta still fails with the above connection error. Again, second try succeeds, as does connecting via psql outside the script just after it fails.

My next debug attempt was to sleep for 5 seconds before the first status check and see what happens. So far this seems to solve the problem.

So why is pgsql not listening on the socket after it starts [OK] and also has status running ok, for up to 5 seconds, unless it has "recently" been restarted?

Was it helpful?

Solution

The status check only checks whether the process is running. It doesn't check whether you can connect. There can be any amount of time between starting the process and the process being ready to accept connections. It's usually a few seconds, but it could be longer. If you need to cope with this, you need to script it so that it checks whether it is possible to connect before proceeding. You could argue that the CentOS package should do this for you, but it doesn't.

Actually, I think in your case there is no reason to do a full restart. Unless you are loading libraries with shared_preload_libraries, it is sufficient to restart the connection to pick up new libraries.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top