Question

I'm using Rails 4's template-caching features extensively. Lots of nested templates and touch: true on lots of models. Overall it has proven to be a comprehensive solution that is easy to reason about.

I recently implemented a feature where multiple background jobs are created which iterate through hundreds of objects and create other objects in relation to them. When each of these objects is created, the same user is touched.

So, these two jobs run in parallel:

  • job A: serially creates hundreds of objects, each time touching User 123
  • job B: serially creates hundreds of objects, each time touching User 123

Upon deploying this feature, I discovered that sometimes postgres detects a deadlock and cancels two queries which are waiting for one another. The query shown in the error is always the touch query (update User 123's updated_at with the current time).

pid: 33541 tid: 0 vtid:  LOG:  database system was shut down at 2014-03-27 19:57:11 EDT
pid: 33545 tid: 0 vtid:  LOG:  autovacuum launcher started
pid: 33539 tid: 0 vtid:  LOG:  database system is ready to accept connections
pid: 33626 tid: 0 vtid: 2/5 LOG:  statement: set client_encoding to 'UTF8'
pid: 33626 tid: 0 vtid: 2/6 LOG:  statement: set client_encoding to 'unicode'
pid: 33626 tid: 0 vtid: 2/7 LOG:  statement: SET client_min_messages TO 'warning'
pid: 33626 tid: 0 vtid: 2/8 LOG:  statement: SHOW client_min_messages
pid: 33626 tid: 0 vtid: 2/9 LOG:  statement: SET client_min_messages TO 'panic'
pid: 33626 tid: 0 vtid: 2/10 LOG:  statement: SET standard_conforming_strings = on
pid: 33626 tid: 0 vtid: 2/11 LOG:  statement: SET client_min_messages TO 'warning'
pid: 33626 tid: 0 vtid: 2/12 LOG:  statement: SET time zone 'UTC'
pid: 33626 tid: 0 vtid: 2/13 LOG:  statement: SELECT oid, typname, typelem, typdelim, typinput FROM pg_type
pid: 33626 tid: 0 vtid: 2/14 LOG:  statement: SHOW TIME ZONE
pid: 33626 tid: 0 vtid: 2/15 LOG:  statement: SELECT 1
pid: 33626 tid: 0 vtid: 2/16 LOG:  statement:               SELECT COUNT(*)
                  FROM pg_class c
                  LEFT JOIN pg_namespace n ON n.oid = c.relnamespace
                  WHERE c.relkind in ('v','r')
                  AND c.relname = 'users'
                  AND n.nspname = ANY (current_schemas(false))

pid: 33626 tid: 0 vtid: 2/17 LOG:  statement:             SELECT attr.attname
                FROM pg_attribute attr
                INNER JOIN pg_constraint cons ON attr.attrelid = cons.conrelid AND attr.attnum = cons.conkey[1]
                WHERE cons.contype = 'p'
                  AND cons.conrelid = '"users"'::regclass

pid: 33626 tid: 0 vtid: 2/18 LOG:  statement: SELECT  "users".* FROM "users"   ORDER BY "users"."id" ASC LIMIT 1
pid: 33626 tid: 0 vtid: 2/19 LOG:  statement:               SELECT a.attname, format_type(a.atttypid, a.atttypmod),
                         pg_get_expr(d.adbin, d.adrelid), a.attnotnull, a.atttypid, a.atttypmod
                    FROM pg_attribute a LEFT JOIN pg_attrdef d
                      ON a.attrelid = d.adrelid AND a.attnum = d.adnum
                   WHERE a.attrelid = '"users"'::regclass
                     AND a.attnum > 0 AND NOT a.attisdropped
                   ORDER BY a.attnum

pid: 33627 tid: 0 vtid: 3/2 LOG:  statement: set client_encoding to 'UTF8'
pid: 33627 tid: 0 vtid: 3/3 LOG:  statement: set client_encoding to 'unicode'
pid: 33627 tid: 0 vtid: 3/4 LOG:  statement: SET client_min_messages TO 'warning'
pid: 33627 tid: 0 vtid: 3/5 LOG:  statement: SHOW client_min_messages
pid: 33627 tid: 0 vtid: 3/6 LOG:  statement: SET client_min_messages TO 'panic'
pid: 33627 tid: 0 vtid: 3/7 LOG:  statement: SET standard_conforming_strings = on
pid: 33627 tid: 0 vtid: 3/8 LOG:  statement: SET client_min_messages TO 'warning'
pid: 33627 tid: 0 vtid: 3/9 LOG:  statement: SET time zone 'UTC'
pid: 33627 tid: 0 vtid: 3/10 LOG:  statement: SELECT oid, typname, typelem, typdelim, typinput FROM pg_type
pid: 33627 tid: 0 vtid: 3/11 LOG:  statement: SHOW TIME ZONE
pid: 33627 tid: 0 vtid: 3/12 LOG:  statement: SELECT 1
pid: 33627 tid: 0 vtid: 3/13 LOG:  statement:               SELECT a.attname, format_type(a.atttypid, a.atttypmod),
                         pg_get_expr(d.adbin, d.adrelid), a.attnotnull, a.atttypid, a.atttypmod
                    FROM pg_attribute a LEFT JOIN pg_attrdef d
                      ON a.attrelid = d.adrelid AND a.attnum = d.adnum
                   WHERE a.attrelid = '"users"'::regclass
                     AND a.attnum > 0 AND NOT a.attisdropped
                   ORDER BY a.attnum

pid: 33627 tid: 0 vtid: 3/14 LOG:  statement:               SELECT COUNT(*)
                  FROM pg_class c
                  LEFT JOIN pg_namespace n ON n.oid = c.relnamespace
                  WHERE c.relkind in ('v','r')
                  AND c.relname = 'users'
                  AND n.nspname = ANY (current_schemas(false))

pid: 33627 tid: 0 vtid: 3/15 LOG:  statement:             SELECT attr.attname
                FROM pg_attribute attr
                INNER JOIN pg_constraint cons ON attr.attrelid = cons.conrelid AND attr.attnum = cons.conkey[1]
                WHERE cons.contype = 'p'
                  AND cons.conrelid = '"users"'::regclass

pid: 33627 tid: 0 vtid: 3/16 LOG:  statement: SELECT  id FROM "users"  WHERE (bared_at IS NOT NULL)  ORDER BY "users"."id" ASC LIMIT 1000
pid: 33628 tid: 0 vtid: 4/2 LOG:  statement: set client_encoding to 'UTF8'
pid: 33628 tid: 0 vtid: 4/3 LOG:  statement: set client_encoding to 'unicode'
pid: 33628 tid: 0 vtid: 4/4 LOG:  statement: SET client_min_messages TO 'warning'
pid: 33628 tid: 0 vtid: 4/5 LOG:  statement: SHOW client_min_messages
pid: 33628 tid: 0 vtid: 4/6 LOG:  statement: SET client_min_messages TO 'panic'
pid: 33628 tid: 0 vtid: 4/7 LOG:  statement: SET standard_conforming_strings = on
pid: 33628 tid: 0 vtid: 4/8 LOG:  statement: SET client_min_messages TO 'warning'
pid: 33628 tid: 0 vtid: 4/9 LOG:  statement: SET time zone 'UTC'
pid: 33628 tid: 0 vtid: 4/10 LOG:  statement: SELECT oid, typname, typelem, typdelim, typinput FROM pg_type
pid: 33628 tid: 0 vtid: 4/11 LOG:  statement: SHOW TIME ZONE
pid: 33628 tid: 0 vtid: 4/12 LOG:  statement: SELECT 1
pid: 33628 tid: 0 vtid: 4/13 LOG:  statement: SELECT  id FROM "users"  WHERE (bared_at IS NOT NULL)  ORDER BY "users"."id" ASC LIMIT 1000
pid: 33627 tid: 0 vtid: 3/17 LOG:  statement:               SELECT COUNT(*)
                  FROM pg_class c
                  LEFT JOIN pg_namespace n ON n.oid = c.relnamespace
                  WHERE c.relkind in ('v','r')
                  AND c.relname = 'monkeys'
                  AND n.nspname = ANY (current_schemas(false))

pid: 33628 tid: 0 vtid: 4/14 LOG:  statement:               SELECT COUNT(*)
                  FROM pg_class c
                  LEFT JOIN pg_namespace n ON n.oid = c.relnamespace
                  WHERE c.relkind in ('v','r')
                  AND c.relname = 'monkeys'
                  AND n.nspname = ANY (current_schemas(false))

pid: 33627 tid: 0 vtid: 3/18 LOG:  statement:             SELECT attr.attname
                FROM pg_attribute attr
                INNER JOIN pg_constraint cons ON attr.attrelid = cons.conrelid AND attr.attnum = cons.conkey[1]
                WHERE cons.contype = 'p'
                  AND cons.conrelid = '"monkeys"'::regclass

pid: 33628 tid: 0 vtid: 4/15 LOG:  statement:             SELECT attr.attname
                FROM pg_attribute attr
                INNER JOIN pg_constraint cons ON attr.attrelid = cons.conrelid AND attr.attnum = cons.conkey[1]
                WHERE cons.contype = 'p'
                  AND cons.conrelid = '"monkeys"'::regclass

pid: 33627 tid: 0 vtid: 3/19 LOG:  statement:               SELECT a.attname, format_type(a.atttypid, a.atttypmod),
                         pg_get_expr(d.adbin, d.adrelid), a.attnotnull, a.atttypid, a.atttypmod
                    FROM pg_attribute a LEFT JOIN pg_attrdef d
                      ON a.attrelid = d.adrelid AND a.attnum = d.adnum
                   WHERE a.attrelid = '"users_monkeys"'::regclass
                     AND a.attnum > 0 AND NOT a.attisdropped
                   ORDER BY a.attnum

pid: 33628 tid: 0 vtid: 4/16 LOG:  statement:               SELECT a.attname, format_type(a.atttypid, a.atttypmod),
                         pg_get_expr(d.adbin, d.adrelid), a.attnotnull, a.atttypid, a.atttypmod
                    FROM pg_attribute a LEFT JOIN pg_attrdef d
                      ON a.attrelid = d.adrelid AND a.attnum = d.adnum
                   WHERE a.attrelid = '"users_monkeys"'::regclass
                     AND a.attnum > 0 AND NOT a.attisdropped
                   ORDER BY a.attnum

pid: 33627 tid: 0 vtid: 3/20 LOG:  statement:               SELECT a.attname, format_type(a.atttypid, a.atttypmod),
                         pg_get_expr(d.adbin, d.adrelid), a.attnotnull, a.atttypid, a.atttypmod
                    FROM pg_attribute a LEFT JOIN pg_attrdef d
                      ON a.attrelid = d.adrelid AND a.attnum = d.adnum
                   WHERE a.attrelid = '"monkeys"'::regclass
                     AND a.attnum > 0 AND NOT a.attisdropped
                   ORDER BY a.attnum

pid: 33628 tid: 0 vtid: 4/17 LOG:  statement:               SELECT a.attname, format_type(a.atttypid, a.atttypmod),
                         pg_get_expr(d.adbin, d.adrelid), a.attnotnull, a.atttypid, a.atttypmod
                    FROM pg_attribute a LEFT JOIN pg_attrdef d
                      ON a.attrelid = d.adrelid AND a.attnum = d.adnum
                   WHERE a.attrelid = '"monkeys"'::regclass
                     AND a.attnum > 0 AND NOT a.attisdropped
                   ORDER BY a.attnum

pid: 33627 tid: 0 vtid: 3/21 LOG:  statement: SHOW search_path
pid: 33628 tid: 0 vtid: 4/18 LOG:  statement: SHOW search_path
pid: 33627 tid: 0 vtid: 3/23 LOG:  execute a1: SELECT COUNT(*) FROM "monkeys" INNER JOIN "users_monkeys" ON "monkeys"."id" = "users_monkeys"."monkey_id" WHERE "users_monkeys"."user_id" = $1 AND (((monkeys.name = 'recruiter') AND (monkeys.resource_type IS NULL) AND (monkeys.resource_id IS NULL)))
pid: 33627 tid: 0 vtid: 3/23 DETAIL:  parameters: $1 = '2'
pid: 33628 tid: 0 vtid: 4/20 LOG:  execute a1: SELECT COUNT(*) FROM "monkeys" INNER JOIN "users_monkeys" ON "monkeys"."id" = "users_monkeys"."monkey_id" WHERE "users_monkeys"."user_id" = $1 AND (((monkeys.name = 'recruiter') AND (monkeys.resource_type IS NULL) AND (monkeys.resource_id IS NULL)))
pid: 33628 tid: 0 vtid: 4/20 DETAIL:  parameters: $1 = '2'
pid: 33627 tid: 0 vtid: 3/24 LOG:  statement: SELECT  "foos".* FROM "foos"  WHERE "foos"."user_id" = 2 AND "foos"."fooer_id" = 1 LIMIT 1
pid: 33628 tid: 0 vtid: 4/21 LOG:  statement: SELECT  "foos".* FROM "foos"  WHERE "foos"."user_id" = 1 AND "foos"."fooer_id" = 2 LIMIT 1
pid: 33628 tid: 0 vtid: 4/22 LOG:  statement:               SELECT a.attname, format_type(a.atttypid, a.atttypmod),
                         pg_get_expr(d.adbin, d.adrelid), a.attnotnull, a.atttypid, a.atttypmod
                    FROM pg_attribute a LEFT JOIN pg_attrdef d
                      ON a.attrelid = d.adrelid AND a.attnum = d.adnum
                   WHERE a.attrelid = '"foos"'::regclass
                     AND a.attnum > 0 AND NOT a.attisdropped
                   ORDER BY a.attnum

pid: 33627 tid: 0 vtid: 3/25 LOG:  statement:               SELECT a.attname, format_type(a.atttypid, a.atttypmod),
                         pg_get_expr(d.adbin, d.adrelid), a.attnotnull, a.atttypid, a.atttypmod
                    FROM pg_attribute a LEFT JOIN pg_attrdef d
                      ON a.attrelid = d.adrelid AND a.attnum = d.adnum
                   WHERE a.attrelid = '"foos"'::regclass
                     AND a.attnum > 0 AND NOT a.attisdropped
                   ORDER BY a.attnum

pid: 33628 tid: 0 vtid: 4/23 LOG:  statement:               SELECT COUNT(*)
                  FROM pg_class c
                  LEFT JOIN pg_namespace n ON n.oid = c.relnamespace
                  WHERE c.relkind in ('v','r')
                  AND c.relname = 'foos'
                  AND n.nspname = ANY (current_schemas(false))

pid: 33627 tid: 0 vtid: 3/26 LOG:  statement:               SELECT COUNT(*)
                  FROM pg_class c
                  LEFT JOIN pg_namespace n ON n.oid = c.relnamespace
                  WHERE c.relkind in ('v','r')
                  AND c.relname = 'foos'
                  AND n.nspname = ANY (current_schemas(false))

pid: 33628 tid: 0 vtid: 4/24 LOG:  statement:             SELECT attr.attname
                FROM pg_attribute attr
                INNER JOIN pg_constraint cons ON attr.attrelid = cons.conrelid AND attr.attnum = cons.conkey[1]
                WHERE cons.contype = 'p'
                  AND cons.conrelid = '"foos"'::regclass

pid: 33627 tid: 0 vtid: 3/27 LOG:  statement:             SELECT attr.attname
                FROM pg_attribute attr
                INNER JOIN pg_constraint cons ON attr.attrelid = cons.conrelid AND attr.attnum = cons.conkey[1]
                WHERE cons.contype = 'p'
                  AND cons.conrelid = '"foos"'::regclass

pid: 33628 tid: 0 vtid: 4/25 LOG:  statement: BEGIN
pid: 33627 tid: 0 vtid: 3/28 LOG:  statement: BEGIN
pid: 33628 tid: 0 vtid: 4/25 LOG:  statement: SELECT  1 AS one FROM "foos"  WHERE ("foos"."user_id" = 1 AND "foos"."id" != 4 AND "foos"."fooer_id" = 2) LIMIT 1
pid: 33627 tid: 0 vtid: 3/28 LOG:  statement: SELECT  1 AS one FROM "foos"  WHERE ("foos"."user_id" = 2 AND "foos"."id" != 3 AND "foos"."fooer_id" = 1) LIMIT 1
pid: 33628 tid: 0 vtid: 4/25 LOG:  statement:               SELECT a.attname, format_type(a.atttypid, a.atttypmod),
                         pg_get_expr(d.adbin, d.adrelid), a.attnotnull, a.atttypid, a.atttypmod
                    FROM pg_attribute a LEFT JOIN pg_attrdef d
                      ON a.attrelid = d.adrelid AND a.attnum = d.adnum
                   WHERE a.attrelid = '"users"'::regclass
                     AND a.attnum > 0 AND NOT a.attisdropped
                   ORDER BY a.attnum

pid: 33627 tid: 0 vtid: 3/28 LOG:  execute a2: SELECT  "users".* FROM "users"  WHERE "users"."id" = $1 LIMIT 1
pid: 33627 tid: 0 vtid: 3/28 DETAIL:  parameters: $1 = '2'
pid: 33627 tid: 0 vtid: 3/28 LOG:  statement: UPDATE "users" SET "updated_at" = '2014-03-27 23:58:02.619282' WHERE "users"."id" = 2
pid: 33628 tid: 0 vtid: 4/25 LOG:  execute a2: SELECT  "users".* FROM "users"  WHERE "users"."id" = $1 LIMIT 1
pid: 33628 tid: 0 vtid: 4/25 DETAIL:  parameters: $1 = '1'
pid: 33627 tid: 6723 vtid: 3/28 LOG:  execute a2: SELECT  "users".* FROM "users"  WHERE "users"."id" = $1 LIMIT 1
pid: 33627 tid: 6723 vtid: 3/28 DETAIL:  parameters: $1 = '1'
pid: 33628 tid: 0 vtid: 4/25 LOG:  statement: UPDATE "users" SET "updated_at" = '2014-03-27 23:58:02.627175' WHERE "users"."id" = 1
pid: 33627 tid: 6723 vtid: 3/28 LOG:  statement: UPDATE "users" SET "updated_at" = '2014-03-27 23:58:02.628983' WHERE "users"."id" = 1
pid: 33628 tid: 6724 vtid: 4/25 LOG:  execute a2: SELECT  "users".* FROM "users"  WHERE "users"."id" = $1 LIMIT 1
pid: 33628 tid: 6724 vtid: 4/25 DETAIL:  parameters: $1 = '2'
pid: 33628 tid: 6724 vtid: 4/25 LOG:  statement: UPDATE "users" SET "updated_at" = '2014-03-27 23:58:02.632111' WHERE "users"."id" = 2
pid: 33627 tid: 6723 vtid: 3/28 ERROR:  deadlock detected
pid: 33627 tid: 6723 vtid: 3/28 DETAIL:  Process 33627 waits for ShareLock on transaction 6724; blocked by process 33628.
    Process 33628 waits for ShareLock on transaction 6723; blocked by process 33627.
    Process 33627: UPDATE "users" SET "updated_at" = '2014-03-27 23:58:02.628983' WHERE "users"."id" = 1
    Process 33628: UPDATE "users" SET "updated_at" = '2014-03-27 23:58:02.632111' WHERE "users"."id" = 2

I am surprised that this deadlock occurs, because neither the row nor the table are being locked. Shouldn't the second transaction simply wait for the first to finish?

update: minimal sql-only reproduction of bug here: https://gist.github.com/jjb/9823023

Était-ce utile?

La solution

Your original log was incomplete, because the first UPDATEs didn't have a transaction ID, so they weren't included in the original log.

Now that the log with virtual txids is there, you can see that PostgreSQL is behaving exactly as designed.

The relevant lines are:

pid: 33627 tid: 0 vtid: 3/28 LOG:  statement: UPDATE "users" SET "updated_at" = '2014-03-27 23:58:02.619282' WHERE "users"."id" = 2

pid: 33628 tid: 0 vtid: 4/25 LOG:  statement: UPDATE "users" SET "updated_at" = '2014-03-27 23:58:02.627175' WHERE "users"."id" = 1

pid: 33627 tid: 6723 vtid: 3/28 LOG:  statement: UPDATE "users" SET "updated_at" = '2014-03-27 23:58:02.628983' WHERE "users"."id" = 1

pid: 33628 tid: 6724 vtid: 4/25 LOG:  statement: UPDATE "users" SET "updated_at" = '2014-03-27 23:58:02.632111' WHERE "users"."id" = 2

pid: 33627 tid: 6723 vtid: 3/28 ERROR:  deadlock detected
pid: 33627 tid: 6723 vtid: 3/28 DETAIL:  Process 33627 waits for ShareLock on transaction 6724; blocked by process 33628.
    Process 33628 waits for ShareLock on transaction 6723; blocked by process 33627.
    Process 33627: UPDATE "users" SET "updated_at" = '2014-03-27 23:58:02.628983' WHERE "users"."id" = 1
    Process 33628: UPDATE "users" SET "updated_at" = '2014-03-27 23:58:02.632111' WHERE "users"."id" = 2

Here:

  • 3/28 locks id=2 for update
  • 4/25 locks id=1 for update
  • 3/28 attempts to lock id=1 for update, blocking on the lock held by 4/25
  • 4/25 attempts to lock id=2 for update, blocking on the lock held by 3/28

At this point neither transaction can make progress, so PostgreSQL aborts one of them.

To prevent this from happening, your application must ensure that it always acquires locks in the same order. Where that is not possible, it must try to only work with one dependent object in any given transaction, so ordering issues cannot arise. Or it must be prepared to handle deadlocks by trapping the exception and re-issuing the transaction.

To learn more, see Explicit locking in the manual.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top