`pg_restore` with `jobs` flag results in `pg_restore: error: a worker process died unexpectedly`

dba.stackexchange https://dba.stackexchange.com/questions/257398

  •  22-02-2021
  •  | 
  •  

Pergunta

I have a script which runs the following:

pg_restore tmp/latest.backup --verbose --clean --no-acl --no-owner --dbname hub_development --jobs=12

This frequently fails with the following error:

error: could not find block ID 4584 in archive -- possibly due to out-of-order restore request, which cannot be handled due to lack of data offsets in archive

pg_restore: error: a worker process died unexpectedly

This error, in turn, means that tables which should have indices, primary keys, etc. end up not having them. For example, when run without multiple cores, our users table looks like this, as expected:


                                           Table "public.users"
       Column       |            Type             | Collation | Nullable |              Default              
--------------------+-----------------------------+-----------+----------+-----------------------------------
 id                 | integer                     |           | not null | nextval('users_id_seq'::regclass)
 created_at         | timestamp without time zone |           | not null | 
 updated_at         | timestamp without time zone |           | not null | 
 email              | character varying           |           | not null | 
 confirmation_token | character varying(128)      |           |          | 
 name               | character varying           |           | not null | ''::character varying
 user_type          | character varying           |           |          | 
 encrypted_password | character varying(128)      |           |          | 
 remember_token     | character varying(128)      |           |          | 
Indexes:
    "users_pkey" PRIMARY KEY, btree (id)
    "index_users_on_email" btree (email)
    "index_users_on_remember_token" btree (remember_token)
Referenced by:
    TABLE "project_feedback_users" CONSTRAINT "fk_rails_08af49ba47" FOREIGN KEY (user_id) REFERENCES users(id)
    TABLE "client_reviews" CONSTRAINT "fk_rails_8fc606dbea" FOREIGN KEY (user_id) REFERENCES users(id)

When run with multiple cores, the table looks like this:


hub_development=# \d users
                                            Table "public.users"
       Column       |            Type             | Collation | Nullable |              Default              
--------------------+-----------------------------+-----------+----------+-----------------------------------
 id                 | integer                     |           | not null | nextval('users_id_seq'::regclass)
 created_at         | timestamp without time zone |           | not null | 
 updated_at         | timestamp without time zone |           | not null | 
 email              | character varying           |           | not null | 
 confirmation_token | character varying(128)      |           |          | 
 name               | character varying           |           | not null | ''::character varying
 user_type          | character varying           |           |          | 
 encrypted_password | character varying(128)      |           |          | 
 remember_token     | character varying(128)      |           |          | 

Based on this, I reached the conclusion that the same worker which created the table itself was not also responsible for adding the indices and foreign keys, that the 2nd worker was attempting to run before the first worker, and that this caused the errors I observed.

The script works fine when I remove the --jobs=12 flag, and the worst-case fix is that I simply do that.

However, for my own education, I'd love to know whether there's a solution which preserves our ability to use multiple cores to parallelize the DB restore while avoiding the out-of-order restore requests.

The error mentions that out-of-order restore requests can't be handled due to lack of data offsets in the archive. Would adding those data offsets solve the problem in the way I described? If so, how would I go about that, and are there any disadvantages to doing so?

I'm not a DB admin and my knowlege here is limited, so please let me know if I haven't provided enough information to answer the question.

My local version of Postgres is 12.1, and the data is coming from a Rails app hosted on Heroku. Here's the result of heroku pg:info:

=== HEROKU_POSTGRESQL_BRONZE_URL, DATABASE_URL
Plan:                  Standard 0
Status:                Available
Data Size:             3.38 GB
Tables:                44
PG Version:            11.5
Connections:           22/120
Connection Pooling:    Available
Credentials:           1
Fork/Follow:           Available
Rollback:              earliest from 2020-01-10 18:17 UTC
Created:               2019-10-29 18:20 UTC
Region:                us
Data Encryption:       In Use
Continuous Protection: On
Maintenance:           not required
Maintenance window:    Wednesdays 18:00 to 22:00 UTC
Add-on:                postgresql-metric-02684
Foi útil?

Solução

UPDATE: the "a worker process died unexpectedly" issue was fixed upstream in PostgreSQL 12.4! Upgrade to get the fix.

For the best compatibility and performance with pg_restore have your pg_dump write the dump file to a local file on disk instead of an unseekable file descriptor.

Outras dicas

I had this issue. For me the solution was to use an older version of postgres in development. Our production server was using 9.x or 10.x but I was trying to restore using postgres 12.x. Downgrading to 10.x worked for me.

At first blush, it looks like a file system issue where pg_dump reached EOF.

See https://github.com/postgres/postgres/blob/7a9c9ce6411720c2bbeaf6e64855d4263c47ea80/src/bin/pg_dump/pg_backup_custom.c#L460

But you say it works fine without --jobs=12. Then this looks like a synchronization issue, but that should show up a lot more often with other people reporting. (I would experiment with --jobs=2.)

My guess then it has to do with the OS. Maybe maximum open files has been hit (normally I would look in /var/log/messages, but you are on Heroku). For more detail on max open files see: https://www.postgresql.org/docs/12/kernel-resources.html#id-1.6.5.6.5

Licenciado em: CC-BY-SA com atribuição
Não afiliado a dba.stackexchange
scroll top