PostgreSQL covering index on large table as alternative to CLUSTER command

https://dba.stackexchange.com/questions/192807

10-10-2020
|

Question

I'm looking for an alternative to executing the CLUSTER command on large tables to keep data efficiently organized on disk, since it requires an ACCESS EXCLUSIVE lock on tables being clustered, making them unavailable for a considerable amount of time.

After reading about covering indexes I got the idea of creating a btree multi-column index specifying all of my table columns, and not just those relevant for the query condition.

The benefits, I suppose, would be twofold:

The query plan would be optimized due to the condition relevant columns being specified first on the index, according to the query's expected data fetch pattern.
All remaining columns would also be included in the index, meaning this index would be a copy of the table and already organized efficiently on disk due to the btree structure.

I believe queries over this index would be very efficient and since it is covering the whole table wouldn't need to access table data scattered on disk.

Would like to ask if this a reasonable solution, and what are the potential downsides to this approach? (besides the additional storage space required for the covering index and expected - small? - decrease in insert/update/delete performance)

La solution

PostgreSQL does not have covering indexes. There has been some work on them, but it has not yet been accepted into the code tree.

So there are limitations. If you have unique constraint on some prefix of the indexed columns, you will have to have a separate index to support that, it can't piggy-back on your larger index. Also, some data types do not support Btree operators which means they cannot be included into a btree index. For example, XML or many geometric types.

Also, an indexed value cannot exceed 1/3 of a page, or about 2700 bytes, so inclusion of some wide columns into the index could fail on this grounds (I don't think covering indexes, if they existed, would solve this problem)

Finally, you only avoid consulting the table for those table pages marked as "all visible". If the table is mostly static, then is probably not a problem. But it the table is very dynamic, it will take very aggressive vacuuming to maintain the "all visible" count.

Partitioning the table may be another option. It is not nearly as fine-grained as clustering it, but can impose less of an ongoing burden. Particularly if all the active updates go against 1 or 2 partitions, while the rest are effectively read-only.

Licencié sous: CC-BY-SA avec attribution

Non affilié à dba.stackexchange