PostgreSQL checkpoint log explained

https://dba.stackexchange.com/questions/185502

09-10-2020
|

Question

I know what PostgreSQL checkpoint is and when it is happening.

I need some additional information about the logs produced by the log_checkpoints = on parameter, so please explain some points of it to me:

2017-09-09 16:31:37 EEST [6428-6524] LOG: checkpoint complete: wrote 30057 buffers (22.9%); 0 transaction log file(s) added, 0 removed, 47 recycled; write=148.465 s, sync=34.339 s, total=182.814 s; sync files=159, longest=16.143 s, average=0.215 s

I know that 22.9% of shared buffers are written (I have 1024 MB shared_buffers so that means 234 MB are written out).
I know that 47 WAL files are recycled, i.e., they are not needed anymore for crash recovery, because the real data from them is already on disk.

Question A. But what about write=148.465 s and sync=34.339? What is the difference? What is write and why its time is far more than the fsync() operation?

Question B. What are sync files? Which files: WAL files? Why sync files are 159, but there are only 47 recycled files? What is the relation between these?

Thank you!

Solution

To your question A: "write" and "sync" mean exactly what they say; it's a sequence of buffered (as in filesystem I/O buffers) writes of dirty pages from the Postgres buffer pool, followed by fsync() (usually) to ensure that filesystem buffers are in fact written to disk. Writes are throttled and happen in the background, so they can take longer.

To your question B: sync files tells you how many OS files have been affected by the checkpoint. As each table and index is stored in its own file, the number gives you an idea of how many database objects have been affected by the checkpoint. WALs are synchronized when the WAL buffer is flushed, that is, way before the checkpoint; that's why it's called "write ahead".

OTHER TIPS

The write and sync phase mean rather different things (unfortunately). In the middle phase of a checkpoint, it writes out all the dirty buffers from shared buffers. It does this in a throttled way, mainly regulated by checkpoint_completion_target. The time reported here is the time it spent writing those buffers, as well as the time it spend sleeping in order to regulate the rate at which the buffers get written. So not all of the "write" time was spent actively writing. The length of time spent in the write phase, 2.5 minutes, doesn't really mean anything without knowing what your other settings are.

During the sync phase, it syncs all of those files it just wrote to (as well as any data files other processes may have written to). This stage is unthrottled, it syncs them as fast as it can. That this took 34 seconds, with one file in there taking 16.143 seconds, suggests that your IO hardware is not up to the workload you are throwing at it.

The 159 synced files refers to the number of data files that had been dirtied since the end of the last checkpoint, and so needed to be synced before the end of this one. The 47 refers to WAL files, not data files. WAL files contain references to changes made to data files. It is quite easy for 47 WAL files to contain references for 159 data files (or for that matter, 15,900 data files). But it is not quite that simple, because the WAL files being recycled are the ones from two checkpoints ago, so the relation between 47 and 159 is an analogy, not a concrete count.

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange