Question

In a 2015 re:Invent talk, AWS mentioned that vacuum should be run not only after updates or deletes but also after inserts. Here is the relevant part of the talk:

http://www.youtube.com/watch?v=tZXp19q8RFo&t=16m2s

Supposedly there is some cleanup that must be done on blocks even if they only received inserts and this cleanup can be done either the first time a block is selected (slowing down reads) or during vacuum. Is this true and if so, exactly what cleanup must be done?

Was it helpful?

Solution

tl;dr: The first process that reads data after it is committed will set hint bits. That will dirty the page, creating write activity. The other thing VACUUM (but not other commands) does is marks the page as all-visible, if appropriate. VACUUM will eventually have to hit the table to freeze the tuples.

The work that needs to be done after an insert isn't really cleanup, at least not in the sense of the other work VACUUM normally does. Before I go into the details, note that this answer is based on current (unreleased) 9.6 code and I'm ignoring the effects of streaming replication, even though it can have an affect on visibility.

Because of MVCC, every time Postgres evaluates whether a tuple should be visible to a query, it must consider whether the transaction that created the tuple (recorded in the xmin hidden field) committed, along with some other criteria. That check is expensive, so as soon as it is known that a transaction is visible to all currently open transactions a "hint bit" is set on the tuple header indicating that. The setting of that bit dirties the page, which means it will have to be written to disk. This can be very confusing if the next command to read the data is a SELECT that is suddenly creating a lot of write traffic. Running a VACUUM after the insert commits will avoid that. Another important distinction is that VACUUM will ALWAYS hint tuples on a page (as long as it got the cleanup lock on the page), but most other commands will only hint if the inserting transaction committed before the command started.

An important point about writing all these hint bits is that VACUUM can be throttled (and autovacuum is throttled by default). Other commands are not throttled and will generate dirty data as rapidly as possible.

VACUUM is the only method for marking pages as all-visible, which is an important performance consideration for some operations (notably, index only scans). If you do a large insert it's very likely that there are many pages with nothing but newly inserted tuples. VACUUM can potentially mark those pages as all-visible, but only if the oldest running transaction when VACUUM started was newer than the transaction that inserted the data.

Because of how MVCC works, tuples that were inserted more than ~2 billion transactions ago must be marked as "frozen". By default autovacuum will kick in to do that every 200M transactions. Running a manual vacuum with vacuum_freeze_min_age set to 0 after a bulk insert can help reduce the impact of that. More aggressively, you could run VACUUM FREEZE on the table after inserting. That would "reset the clock" on when the next freeze scan would happen.

If you want to know the specific details, take a look at the HEAPTUPLE_LIVE case after the call to HeapTupleSatisfiesVacuum() inside lazy_scan_heap(). See also HeapTupleSatisfiesVacuum() itself, and compare it to HeapTupleSatisfiesMVCC().

There are two other presentations of mine that might be interesting. The first video is available from http://www.pgcon.org/2015/schedule/events/829.en.html, while the second one (which I think was a bit better) at https://www.youtube.com/watch?v=L8nErzxPJjQ

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top