Question

I'm running into an issue whereby I have two connections from a user to my PostgreSQL server which have been running for about 4 hours and have been in a commit state for quite some time (at least 1 hour that I have been watching it). These connections are blocking other queries from running but themselves aren't blocked.

Here are the two connections in question.

postgres=# select * from pg_stat_activity where usename = 'xxxxx';
 datid | datname | procpid | usesysid | usename | current_query | waiting |          xact_start           |          query_start          |         backend_start         |  client_addr  | client_port
-------+---------+---------+----------+---------+---------------+---------+-------------------------------+-------------------------------+-------------------------------+---------------+-------------
 20394 | xxxxxx  |   17509 |    94858 | xxxxx   | COMMIT        | f       | 2014-01-30 05:51:11.311363-05 | 2014-01-30 05:51:12.042515-05 | 2014-01-30 05:51:11.294444-05 | xx.xx.xxx.xxx |       63531
 20394 | xxxxxx  |    9593 |    94858 | xxxxx   | COMMIT        | f       | 2014-01-30 06:45:17.032651-05 | 2014-01-30 06:45:17.694533-05 | 2014-01-30 06:45:16.992576-05 | xx.xx.xxx.xxx |       63605

PID 9593 is the most problematic one which other users get blocked by this one. As far as the user is admitting to, he truncated his table, then did inserts in batches of 1,000 committing after each batches.

Currently this PID shows the following locks:

postgres=# select * from pg_locks where pid = 9593;
   locktype    | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid  |        mode         | granted
---------------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+------+---------------------+---------
 relation      |    20394 | 29173472 |      |       |            |               |         |       |          | 261/0              | 9593 | AccessExclusiveLock | t
 relation      |    20394 | 27794470 |      |       |            |               |         |       |          | 261/0              | 9593 | RowExclusiveLock    | t
 relation      |    20394 | 27794470 |      |       |            |               |         |       |          | 261/0              | 9593 | ShareLock           | t
 relation      |    20394 | 27794470 |      |       |            |               |         |       |          | 261/0              | 9593 | AccessExclusiveLock | t
 virtualxid    |          |          |      |       | 261/503292 |               |         |       |          | 261/0              | 9593 | ExclusiveLock       | t
 transactionid |          |          |      |       |            |     503213304 |         |       |          | 261/0              | 9593 | ExclusiveLock       | t

I can't kill this PID (nothing happens when I issue the kill command). I'm not sure what to do to diagnose this further and obviously resolve this.

Any input anyone?

Running PostgreSQL 8.4 on Ubuntu Linux server.

EDIT:

As I found other connections in a similar state where the commit was hanging, I looked further and found the following in the server logs:

Jan 30 02:29:45 server001 kernel: [3521062.240540] postgres      D 0000000000000000     0 23220   8154 0x00000004
Jan 30 02:29:45 server001 kernel: [3521062.240550]  ffff8800174c9d08 0000000000000082 ffff88041cd24728 0000000000015880
Jan 30 02:29:45 server001 kernel: [3521062.240559]  ffff8806c678b110 0000000000015880 0000000000015880 0000000000015880
Jan 30 02:29:45 server001 kernel: [3521062.240567]  0000000000015880 ffff8806c678b110 0000000000015880 0000000000015880
Jan 30 02:29:45 server001 kernel: [3521062.240575] Call Trace:
Jan 30 02:29:45 server001 kernel: [3521062.240582]  [<ffffffff810da010>] ? sync_page+0x0/0x50
Jan 30 02:29:45 server001 kernel: [3521062.240590]  [<ffffffff81528488>] io_schedule+0x28/0x40
Jan 30 02:29:45 server001 kernel: [3521062.240596]  [<ffffffff810da04d>] sync_page+0x3d/0x50
Jan 30 02:29:45 server001 kernel: [3521062.240603]  [<ffffffff815289a7>] __wait_on_bit+0x57/0x80
Jan 30 02:29:45 server001 kernel: [3521062.240610]  [<ffffffff810da1be>] wait_on_page_bit+0x6e/0x80
Jan 30 02:29:45 server001 kernel: [3521062.240618]  [<ffffffff81078540>] ? wake_bit_function+0x0/0x40
Jan 30 02:29:45 server001 kernel: [3521062.240627]  [<ffffffff810e4480>] ? pagevec_lookup_tag+0x20/0x30
Jan 30 02:29:45 server001 kernel: [3521062.240634]  [<ffffffff810da665>] wait_on_page_writeback_range+0xf5/0x190
Jan 30 02:29:45 server001 kernel: [3521062.240644]  [<ffffffff81053668>] ? try_to_wake_up+0x118/0x340
Jan 30 02:29:45 server001 kernel: [3521062.240651]  [<ffffffff810da727>] filemap_fdatawait+0x27/0x30
Jan 30 02:29:45 server001 kernel: [3521062.240659]  [<ffffffff811431b4>] vfs_fsync+0xa4/0xf0
Jan 30 02:29:45 server001 kernel: [3521062.240667]  [<ffffffff81143239>] do_fsync+0x39/0x60
Jan 30 02:29:45 server001 kernel: [3521062.240674]  [<ffffffff8114328b>] sys_fsync+0xb/0x10
Jan 30 02:29:45 server001 kernel: [3521062.240682]  [<ffffffff81012042>] system_call_fastpath+0x16/0x1b

No correct solution

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top