Question

My understanding is that an update locks a tuple, marks it as deleted, and then adds a new tuple.

In other words, update = delete + insert.

Or so I had believe. But it appears there something fundamentally different about update from delete+insert in MVCC.


Setup:

CREATE TABLE example (a int PRIMARY KEY, b int);
INSERT INTO example VALUES (1, 1);

Method 1: Update

-- session A                          session B
BEGIN;
UPDATE example SET b = 2 WHERE a = 1;
                                      DELETE FROM example WHERE a = 1;
COMMIT;
-- now there are 0 rows in table example (1 row was deleted by session B)

Method 2: Delete and insert

-- session A                          session B
BEGIN;
DELETE FROM example WHERE a = 1;
INSERT INTO example VALUES (1, 2);
                                      DELETE FROM example WHERE a = 1;
COMMIT;
-- now there is 1 row in table example (0 rows deleted by session B)

Thus

UPDATE example SET b = 2 WHERE a = 1;

is different from

DELETE FROM example WHERE a = 1;
INSERT INTO example VALUES (1, 2);

How am I to understand the MVCC nature of update? Does the tuple has some sort of MVCC "identity" that is preserved during the update? What is it?

Was it helpful?

Solution

Yes, there is a difference between UPDATE and DELETE + INSERT.

Let's use the pageinspect extension to look at the tuples and the tuple headers.

If you want to repeat my experiment, you have to drop and re-create the table in between. Also, there may be additional flags (hint bits) if you selected the rows before examining them.

The meaning of infomask2 and infomask can be found in src/include/access/htup_details.h, see the quotations at the end of the answer.

After the UPDATE:

SELECT lp, t_xmin, t_xmax, t_ctid, t_infomask2, t_infomask, t_attrs
FROM heap_page_item_attrs(get_raw_page('example', 0), 'example');

 lp | t_xmin | t_xmax | t_ctid | t_infomask2 | t_infomask |            t_attrs            
----+--------+--------+--------+-------------+------------+-------------------------------
  1 | 380943 | 380944 | (0,2)  |       16386 |        256 | {"\\x01000000","\\x02000000"}
  2 | 380944 |      0 | (0,2)  |       32770 |      10240 | {"\\x01000000","\\x02000000"}
(2 rows)
  • The first tuple is the dead one. Its t_ctid has been changed to point to the updated version.

    This is one of the key points, so let me expand on this: the ctid of a tuple is the combination of the block number and the “line pointer” (lp in the query result. t_ctid is normally redundant, but in this case it is used to point to the new row version. This is the link between the original tuple and the updated version.

    t_infomask2 is 2 (the number of columns) plus HEAP_HOT_UPDATED, so this row received a HOT update (there was enough space in the block, and there is no index). t_infomask is HEAP_XMIN_COMMITTED (a hint bit).

  • The second tuple is the new version.

    t_infomask2 is 2 plus HEAP_ONLY_TUPLE, so this is “heap-only tuple” that is only reachable via the updated ctid of the old version. t_infomask is HEAP_XMAX_INVALID (true, it is 0) plus HEAP_UPDATED (this is the updated version).

After the DELETE + INSERT:

SELECT lp, t_xmin, t_xmax, t_ctid, t_infomask2, t_infomask, t_attrs
FROM heap_page_item_attrs(get_raw_page('example', 0), 'example');

 lp | t_xmin | t_xmax | t_ctid | t_infomask2 | t_infomask |            t_attrs            
----+--------+--------+--------+-------------+------------+-------------------------------
  1 | 380958 | 380961 | (0,1)  |        8194 |        256 | {"\\x01000000","\\x02000000"}
  2 | 380961 |      0 | (0,2)  |           2 |       2048 | {"\\x01000000","\\x02000000"}
(2 rows)
  • Again, the first tuple is the dead one.

    t_infomask2 is 2 plus HEAP_KEYS_UPDATED (this is a deleted or updated tuple), and t_infomask is HEAP_XMIN_COMMITTED (the tuple was valid before it was deleted).

  • The second tuple is the inserted one:

    t_infomask2 is 2 plus, and t_infomask is HEAP_XMAX_INVALID (it is 0), so this is a new tuple.

Explanation of the observed difference:

In READ COMMITTED isolation level, a transaction always sees the latest committed version of a row. The DELETE in session B has to lock the row and is blocked by the UPDATE or DELETE in session A.

The documentation explains what happens when the lock is released:

UPDATE, DELETE, SELECT FOR UPDATE, and SELECT FOR SHARE commands behave the same as SELECT in terms of searching for target rows: they will only find target rows that were committed as of the command start time. However, such a target row might have already been updated (or deleted or locked) by another concurrent transaction by the time it is found. In this case, the would-be updater will wait for the first updating transaction to commit or roll back (if it is still in progress). If the first updater rolls back, then its effects are negated and the second updater can proceed with updating the originally found row. If the first updater commits, the second updater will ignore the row if the first updater deleted it, otherwise it will attempt to apply its operation to the updated version of the row. The search condition of the command (the WHERE clause) is re-evaluated to see if the updated version of the row still matches the search condition. If so, the second updater proceeds with its operation using the updated version of the row.

In the case of the UPDATE there is a link between the old and the new row version, so PostgreSQL locks and deletes the new row version, while in the case of DELETE + INSERT there is no valid version of the row after the lock is gone, and nothing is deleted.

So while in many respects UPDATE and DELETE + INSERT are quite similar in PostgreSQL, they are not the same: in the second case, there is no connection between the deleted and the inserted row.

Appendix: the meaning of infomask and infomask2

t_infomask:

/*
 * information stored in t_infomask:
 */
#define HEAP_HASNULL            0x0001  /* has null attribute(s) */
#define HEAP_HASVARWIDTH        0x0002  /* has variable-width attribute(s) */
#define HEAP_HASEXTERNAL        0x0004  /* has external stored attribute(s) */
#define HEAP_HASOID             0x0008  /* has an object-id field */
#define HEAP_XMAX_KEYSHR_LOCK   0x0010  /* xmax is a key-shared locker */
#define HEAP_COMBOCID           0x0020  /* t_cid is a combo cid */
#define HEAP_XMAX_EXCL_LOCK     0x0040  /* xmax is exclusive locker */
#define HEAP_XMAX_LOCK_ONLY     0x0080  /* xmax, if valid, is only a locker */

 /* xmax is a shared locker */
#define HEAP_XMAX_SHR_LOCK  (HEAP_XMAX_EXCL_LOCK | HEAP_XMAX_KEYSHR_LOCK)

#define HEAP_LOCK_MASK  (HEAP_XMAX_SHR_LOCK | HEAP_XMAX_EXCL_LOCK | \
                         HEAP_XMAX_KEYSHR_LOCK)
#define HEAP_XMIN_COMMITTED     0x0100  /* t_xmin committed */
#define HEAP_XMIN_INVALID       0x0200  /* t_xmin invalid/aborted */
#define HEAP_XMIN_FROZEN        (HEAP_XMIN_COMMITTED|HEAP_XMIN_INVALID)
#define HEAP_XMAX_COMMITTED     0x0400  /* t_xmax committed */
#define HEAP_XMAX_INVALID       0x0800  /* t_xmax invalid/aborted */
#define HEAP_XMAX_IS_MULTI      0x1000  /* t_xmax is a MultiXactId */
#define HEAP_UPDATED            0x2000  /* this is UPDATEd version of row */
#define HEAP_MOVED_OFF          0x4000  /* moved to another place by pre-9.0
                                         * VACUUM FULL; kept for binary
                                         * upgrade support */
#define HEAP_MOVED_IN           0x8000  /* moved from another place by pre-9.0
                                         * VACUUM FULL; kept for binary
                                         * upgrade support */
#define HEAP_MOVED (HEAP_MOVED_OFF | HEAP_MOVED_IN)

#define HEAP_XACT_MASK          0xFFF0  /* visibility-related bits */

t_infomask2:

/*
 * information stored in t_infomask2:
 */
#define HEAP_NATTS_MASK         0x07FF  /* 11 bits for number of attributes */
/* bits 0x1800 are available */
#define HEAP_KEYS_UPDATED       0x2000  /* tuple was updated and key cols
                                         * modified, or tuple deleted */
#define HEAP_HOT_UPDATED        0x4000  /* tuple was HOT-updated */
#define HEAP_ONLY_TUPLE         0x8000  /* this is heap-only tuple */

#define HEAP2_XACT_MASK         0xE000  /* visibility-related bits */
Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top