How to maintain visibility of all new transactions in append-only PostgreSQL DB without scanning the whole table

dba.stackexchange https://dba.stackexchange.com/questions/259298

Question

Situation

PostgreSQL v11

I have a database with a dozen tables. No rows are ever DELETEd or UPDATEd. A bulk of data is INSERTed to all the tables in a 'few' (up to 1,000) transactions every day. Some tables can add tens of GBs of data during the INSERT (the largest has has almost 2 billion rows as of now).

Problem

I have noticed that at some point SELECT queries I use to read the data from the DB stop using index only scans. After some digging it became apparent this is due to visibility map becoming out-of-date. This is confirmed by running VACUUM as it reverts back to using index only scans. However, VACUUM is very expensive in my case (can take over 10 hours for the largest table) and AUTOVACUUM is never triggered as there are no DELETE or UPDATE operations.

I have looked at running VACUUM FREEZE after each transaction but it seems it will need to scan the whole table after each transaction, which again is going to take ages.

Question

What is the best way to mark all the new transactions as visible for append-only PostgreSQL without scanning the whole table every time?

Était-ce utile?

La solution

You should run VACUUM (FREEZE) occasionally. The longer it doesn't run, the more it has to do, and the longer it will take.

To speed up VACUUM, increase maintenance_work_mem.

Autres conseils

What is the best way to mark all the new transactions as visible for append-only PostgreSQL without scanning the whole table every time?

PostgreSQL doesn't have to scan the parts of the table which are already marked as all visible/all frozen. If there are absolutely no obsolete tuples (which for append-only workloads there should not be any, unless some of your INSERTs have rolled back) then it might not have to scan the indexes either. So I don't think the problem you are worried about actually exists.

However, VACUUM is very expensive in my case (can take over 10 hours for the largest table)

How long had you let it go before running that VACUUM? How long did the next one after that one take? There is nothing inherently wrong with a VACUUM taking 10 hours to complete, if that is a problem you should describe what the problem with it is.

Licencié sous: CC-BY-SA avec attribution
Non affilié à dba.stackexchange
scroll top