Why xlog checkpoint was started?
-
02-03-2021 - |
Question
Consider the following config
shared_buffers = 20GB
max_wal_size = 2GB
min_wal_size = 1GB
checkpoint_completion_target = 0.7
checkpoint_timeout = 5min
and checkpoints' log
2020-05-25 16:28:37.128 | LOG: checkpoint starting: time
2020-05-25 16:32:07.098 | LOG: checkpoint complete: wrote 21420 buffers (0.8%); 0 WAL file(s) added, 0 removed, 7 recycled; write=209.932 s, sync=0.009 s, total=209.969 s; sync files=295, longest=0.002 s, average=0.000 s; distance=45191 kB, estimate=503334 kB
2020-05-25 16:33:37.165 | LOG: checkpoint starting: time
2020-05-25 16:37:08.001 | LOG: checkpoint complete: wrote 25041 buffers (1.0%); 0 WAL file(s) added, 0 removed, 25 recycled; write=209.842 s, sync=0.865 s, total=210.835 s; sync files=421, longest=0.120 s, average=0.002 s; distance=82180 kB, estimate=461218 kB
2020-05-25 16:38:02.690 | LOG: checkpoint starting: xlog
2020-05-25 16:41:32.181 | LOG: checkpoint complete: wrote 191629 buffers (7.3%); 0 WAL file(s) added, 0 removed, 42 recycled; write=209.239 s, sync=0.049 s, total=209.491 s; sync files=666, longest=0.006 s, average=0.000 s; distance=758070 kB, estimate=758070 kB
Why at 16:41 the xlog checkpoint was started? The distance from prev checkpoint is just 758070 KB and it's greatly less then max_wal_size.
PS
It seems that max_wal_size is spread between several checkpoints, and xlog checkpoint is started if value max_wal_size / (2 + checkpoint_completion_target )
is exceeded. Am I right?
Solution
Dig into PG10 source code https://github.com/postgres/postgres/blob/REL_10_STABLE/src/backend/access/transam/xlog.c#L2224, thanks @pifor for the hint
/*
* Calculate CheckPointSegments based on max_wal_size_mb and
* checkpoint_completion_target.
*/
static void
CalculateCheckpointSegments(void)
{
double target;
/*-------
* Calculate the distance at which to trigger a checkpoint, to avoid
* exceeding max_wal_size_mb. This is based on two assumptions:
*
* a) we keep WAL for two checkpoint cycles, back to the "prev" checkpoint.
* b) during checkpoint, we consume checkpoint_completion_target *
* number of segments consumed between checkpoints.
*-------
*/
target = (double) ConvertToXSegs(max_wal_size_mb) / (2.0 + CheckPointCompletionTarget);
/* round down */
CheckPointSegments = (int) target;
if (CheckPointSegments < 1)
CheckPointSegments = 1;
}
So the max_wal_size is divided across three checkpoints, two prev and the current with checkpoint_completion_target in mind.
OTHER TIPS
According to PostgreSQL 10 source code, this is due to CHECKPOINT_CAUSE_XLOG
:
From xlog.c
in function XLogWrite
:
* Request a checkpoint if we've consumed too much xlog since
* the last one. For speed, we first check using the local
* copy of RedoRecPtr, which might be out of date; if it looks
* like a checkpoint is needed, forcibly update RedoRecPtr and
* recheck.
if (IsUnderPostmaster && XLogCheckpointNeeded(openLogSegNo))
{
(void) GetRedoRecPtr();
if (XLogCheckpointNeeded(openLogSegNo))
RequestCheckpoint(CHECKPOINT_CAUSE_XLOG);
}
with:
/*
* Check whether we've consumed enough xlog space that a checkpoint is needed.
*
* new_segno indicates a log file that has just been filled up (or read
* during recovery). We measure the distance from RedoRecPtr to new_segno
* and see if that exceeds CheckPointSegments.
*
* Note: it is caller's responsibility that RedoRecPtr is up-to-date.
*/
static bool
XLogCheckpointNeeded(XLogSegNo new_segno)
{
XLogSegNo old_segno;
XLByteToSeg(RedoRecPtr, old_segno, wal_segment_size);
if (new_segno >= old_segno + (uint64) (CheckPointSegments - 1))
return true;
return false;
}
and:
/*
* Calculate CheckPointSegments based on max_wal_size_mb and
* checkpoint_completion_target.
*/
static void
CalculateCheckpointSegments(void)
{
double target;
/*-------
* Calculate the distance at which to trigger a checkpoint, to avoid
* exceeding max_wal_size_mb. This is based on two assumptions:
*
* a) we keep WAL for only one checkpoint cycle (prior to PG11 we kept
* WAL for two checkpoint cycles to allow us to recover from the
* secondary checkpoint if the first checkpoint failed, though we
* only did this on the master anyway, not on standby. Keeping just
* one checkpoint simplifies processing and reduces disk space in
* many smaller databases.)
* b) during checkpoint, we consume checkpoint_completion_target *
* number of segments consumed between checkpoints.
*-------
*/
target = (double) ConvertToXSegs(max_wal_size_mb, wal_segment_size) /
(1.0 + CheckPointCompletionTarget);
/* round down */
CheckPointSegments = (int) target;
if (CheckPointSegments < 1)
CheckPointSegments = 1;
}