PostgreSQL Index Only Scan can NOT return ctid

https://dba.stackexchange.com/questions/234740

29-01-2021
|

Question

I have this table in PostgreSQL 11:

CREATE TABLE A (id bigint PRIMARY KEY, text text)

Now I want to fetch the ctid of rows meeting some condition like id = 123.

However, even using pg_hint_plan to hint PostgreSQL to use Index Only Scan, it will always give me Index Scan:

/*+ IndexOnlyScan(a) */ explain analyze select ctid from a where id = 823977776533426178;
                                                QUERY PLAN                                                
----------------------------------------------------------------------------------------------------------
 Index Scan using a_pkey on a  (cost=0.14..8.16 rows=1 width=6) (actual time=0.038..0.039 rows=1 loops=1)
   Index Cond: (id = '823977776533426178'::bigint)
 Planning Time: 0.122 ms
 Execution Time: 0.095 ms
(4 rows)

My understanding is that ctid is the physical row id (block_no: record_offset) of each row and it must be included in any btree index, since it needs this information to fetch the row from heap file.

Then why can't an index-only scan return the ctid directly?
Is it just that PostgreSQL does not implement it that way?

Solution

Your understanding is almost but not quite correct. True, every btree index tuple needs a ctid (or some form of block number and tuple index) to point to the heap tuple (table row). But (at least) since the introduction of "heap-only tuples" with Postgres 8.3, there may be a HOT chain to follow to arrive at the current live tuple in the snapshot - with a different ctid than the one stored in the index.

Since it is also not possible to use the ctid (or any system column) as index expression, it is currently (and for the forseeable future) completely impossible to get a ctid from an index-only scan.

Quoting the Postgres Wiki on Index-only_scans / Interaction_with_HOT:

With HOT, it became possible for an index scan to traverse a so-called HOT chain; it could get from the physical index tuple (which would probably have been created by an original INSERT, and related to an earlier version of the logical tuple), to the corresponding physical heap tuple. The heap tuple would itself contain a pointer to the next version of the tuple (that is, the tuple ctid), which might, in turn, have a pointer of its own. The index scan eventually arrives at tuple that is current according to the query's snapshot.

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange