Question

I am transitioning from SQL Server to Postgres, and one of the biggest things for me to digest is the non-existence of the "clustered key" that sorts the data in Postgres.

Can someone share their thoughts on how Postgres avoided the need for an internally sorted dataset and how it works with large heap tables and still supply exceptional performance?

Was it helpful?

Solution

You can try pg_repack extension to cluster online with less locking

OTHER TIPS

PostgreSQL simply doesn't implement this feature. There is no trick to not implementing it. It isn't implemented in the straight forward, uncomplicated way of just not doing it. To use one bit of jargon, all btree indexes in PostgreSQL are "secondary indexes", not "primary indexes". Even the primary key's index is a "secondary index".

There are some cases where clustered keys (or index organized tables, as another product calls them) are important, and in those cases PostgreSQL fails to "supply exceptional performance". You can argue about how common those cases are, of course, but they certainly do exist and it is unfortunate that PostgreSQL doesn't offer a solution for them. There have proposals to address this, but I don't think any of those efforts are currently active.

In some cases, you can ameliorate the problem by using the CLUSTER command, or by implementing partitioning, or by using covering indexes, but none of these is entirely satisfactory as an alternative to real clustering.

PostgreSQL doesn't do anything special to replace the "need" of a clustered index.

It just simply doesn't have that feature. (Some would say that isn't a great loss.)

You can manually perform a one-time cluster with CLUSTER or pg_repack.

There is also declarative partitioning (though it has a number of caveats until PostgreSQL 11). It isn't quite clustering, but can be used to group rows into specified buckets.

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top