Question

Say you have a large table with tens of millions of rows.

You want to UPDATE large_table SET col=value WHERE col=other_value... but col is not indexed and an EXPLAIN shows that this query will perform a seq scan over the whole table.

What is the lock behaviour here? According to most accounts Postgres only locks the affected rows of an UPDATE query and does not have lock escalation. So does it search for the rows to update first, then only lock the found rows? It seems like there would potentially be problems of other queries updating rows concurrently in that case though. Does it lock each row "as it finds them" i.e. locking rows progressively as it goes through the seq scan?

So I think the best case here is it locks rows as it finds them, and the affected rows (only) will be locked for up to as long as the UPDATE query takes to complete.

But I am worried that this query could instead end up blocking all writes to the table until it completes.

I have read this: https://habr.com/en/company/postgrespro/blog/503008/ and I think the worst case will not happen, but here https://blog.heroku.com/curious-case-table-locking-update-query is a possibly inaccurate representation of similar info that gives me some doubts.

The application only uses SELECT, SELECT FOR UPDATE and UPDATE queries (i.e. no other explicit locks taken apart from those). The table has foreign keys to other tables, and other tables have foreign key to this table.

We're on Postgres 11.

Was it helpful?

Solution

For the discussion, let's assume your execution plan looks like

        QUERY PLAN        
--------------------------
 Update on mytab
   ->  Seq Scan on mytab
         Filter: (id = 1)

I also assume that you are using the default READ COMMITTED isolation level.

Then PostgreSQL will sequentially read to the table.

Whenever it finds a row that matches the filter, that row will be locked and updated.

If locking a row is blocked by a concurrent query, PostgreSQL waits until the lock goes away. Then it re-evaluates the filter condition and either moves on (if the condition no longer applies on account of a concurrent modification) or it locks and updates the modified row.

See the documentation:

UPDATE, DELETE, SELECT FOR UPDATE, and SELECT FOR SHARE commands behave the same as SELECT in terms of searching for target rows: they will only find target rows that were committed as of the command start time. However, such a target row might have already been updated (or deleted or locked) by another concurrent transaction by the time it is found. In this case, the would-be updater will wait for the first updating transaction to commit or roll back (if it is still in progress). If the first updater rolls back, then its effects are negated and the second updater can proceed with updating the originally found row. If the first updater commits, the second updater will ignore the row if the first updater deleted it, otherwise it will attempt to apply its operation to the updated version of the row. The search condition of the command (the WHERE clause) is re-evaluated to see if the updated version of the row still matches the search condition. If so, the second updater proceeds with its operation using the updated version of the row.

In particular, it is possible that two UPDATE statements that each modify several rows deadlock with each other, since they acquire locks as the proceed and locks are always held until the end of the transaction.

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top