Question

Since readers do not block writers and writers do not block readers is what MVCC is all about, how does postgresql prevent writers who are in the process of committing their changes to blocks/pages prevent a reader from reading this inconsistent data?

So from what I understand about PostgreSQL's MVCC, each table is represented as a heap file and each heap file contains a collection of pages / blocks which are 8 kb segments of information.

So say two transaction T1 and T2 concurrently running and T1 decides to make some modification to block1. So at first, T1 reads block1 and makes modifications to it in-memory and when it's finally done, it decides to commit; ie actually write back these in-memory changes to block1. So I understand that xmax of the previous entries it modified are set to id(T1) and a new entry is created with xmin set to id(T1). Now what I find difficult is, say while the actual commit process occurs for T1, say T2 reads (while the write by T1 has started but not ended). How is this case handled?

Any answer would be appreciated

Was it helpful?

Solution

That's the whole idea of MVCC - Multi-Version-Concurrency-Control.

Whenever data is being modified while there are other active transactions reading it, or have read it previously (depending on the reader transaction's isolation level), a copy of the pre-modified data is set aside for these transactions so as not to block them. This copy is kept until all those reader transactions complete, so they can see a consistent version.

This is the price that you pay with MVCC as the server needs to maintain multiple copies of the same data, vs. lock based (pessimistic) isolation that uses only one copy of the data, but introduces potential blocking. No free meals :-)

You can read all the details about it here https://www.postgresql.org/docs/11/mvcc.html

I also invite you to take my pluralsight course on this exact topic which discusses MVCC and isolation levels in depth, with real life examples. Individuals can get a 1 month pluralsight free trial.

OTHER TIPS

Now what I find difficult is, say while the actual commit process occurs for T1, say T2 reads (while the write by T1 has started but not ended). How is this case handled?

It doesn't have to be handled because this step doesn't really exist.

You're making the assumption that transactions modify blocks exclusively in memory, until there is a commit, at which point all modified blocks would be flushed to disk. But it doesn't work like that at all. If it did, the amount of changes that a transaction could do would be limited by the memory, and that's not the case.

Also, MVCC in Postgres works at the row level, not at the page level.

A logical row in a table is materialized on disk as multiple versions of that row, versioned with the (xmin,xmax) pseudo-columns. When updating a row, instead of overwriting the "current" content on disk, a transaction creates a new version with the new content. Each transaction knows what version (if any) of the row is visible to itself based on these per-row (xmin,xmax) values. When a row has a version that is no longer visible by any transaction, vacuum will eventually notice it and reclaim its space on disk. These are called "dead rows".

So there is never a "flush" step during which a transaction T2 would need to read a row changed and committed by T1 and would be uncertain about whether it's reading a pre-T1 or post-T1 value. The reads made by T2 will always reach the only version of the row that T2 can see, thanks to the filtering with (xmin,xmax).

The visibility switch to the new version of rows that have been created by T1 is instantaneous from the point of view of T2: it corresponds to a switch of snapshot, which may happen during T2 if it runs under the Read Committed isolation level.

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top