Why xlog checkpoint was started?

https://dba.stackexchange.com/questions/267856

02-03-2021
|

Question

Consider the following config

shared_buffers = 20GB
max_wal_size = 2GB
min_wal_size = 1GB

checkpoint_completion_target = 0.7
checkpoint_timeout = 5min

and checkpoints' log

2020-05-25 16:28:37.128 | LOG:  checkpoint starting: time
2020-05-25 16:32:07.098 | LOG:  checkpoint complete: wrote 21420 buffers (0.8%); 0 WAL file(s) added, 0 removed, 7 recycled; write=209.932 s, sync=0.009 s, total=209.969 s; sync files=295, longest=0.002 s, average=0.000 s; distance=45191 kB, estimate=503334 kB
2020-05-25 16:33:37.165 | LOG:  checkpoint starting: time
2020-05-25 16:37:08.001 | LOG:  checkpoint complete: wrote 25041 buffers (1.0%); 0 WAL file(s) added, 0 removed, 25 recycled; write=209.842 s, sync=0.865 s, total=210.835 s; sync files=421, longest=0.120 s, average=0.002 s; distance=82180 kB, estimate=461218 kB
2020-05-25 16:38:02.690 | LOG:  checkpoint starting: xlog
2020-05-25 16:41:32.181 | LOG:  checkpoint complete: wrote 191629 buffers (7.3%); 0 WAL file(s) added, 0 removed, 42 recycled; write=209.239 s, sync=0.049 s, total=209.491 s; sync files=666, longest=0.006 s, average=0.000 s; distance=758070 kB, estimate=758070 kB

Why at 16:41 the xlog checkpoint was started? The distance from prev checkpoint is just 758070 KB and it's greatly less then max_wal_size.

It seems that max_wal_size is spread between several checkpoints, and xlog checkpoint is started if value max_wal_size / (2 + checkpoint_completion_target ) is exceeded. Am I right?

Solution

Dig into PG10 source code https://github.com/postgres/postgres/blob/REL_10_STABLE/src/backend/access/transam/xlog.c#L2224, thanks @pifor for the hint

/*
 * Calculate CheckPointSegments based on max_wal_size_mb and
 * checkpoint_completion_target.
 */
static void
CalculateCheckpointSegments(void)
{
    double      target;

    /*-------
     * Calculate the distance at which to trigger a checkpoint, to avoid
     * exceeding max_wal_size_mb. This is based on two assumptions:
     *
     * a) we keep WAL for two checkpoint cycles, back to the "prev" checkpoint.
     * b) during checkpoint, we consume checkpoint_completion_target *
     *    number of segments consumed between checkpoints.
     *-------
     */
    target = (double) ConvertToXSegs(max_wal_size_mb) / (2.0 + CheckPointCompletionTarget);

    /* round down */
    CheckPointSegments = (int) target;

    if (CheckPointSegments < 1)
        CheckPointSegments = 1;
}

So the max_wal_size is divided across three checkpoints, two prev and the current with checkpoint_completion_target in mind.

OTHER TIPS

According to PostgreSQL 10 source code, this is due to CHECKPOINT_CAUSE_XLOG:

From xlog.c in function XLogWrite:

          * Request a checkpoint if we've consumed too much xlog since
          * the last one.  For speed, we first check using the local
          * copy of RedoRecPtr, which might be out of date; if it looks
          * like a checkpoint is needed, forcibly update RedoRecPtr and
          * recheck.
           if (IsUnderPostmaster && XLogCheckpointNeeded(openLogSegNo))
                 {
                     (void) GetRedoRecPtr();
                     if (XLogCheckpointNeeded(openLogSegNo))
                         RequestCheckpoint(CHECKPOINT_CAUSE_XLOG);
                 }

with:

/*
  * Check whether we've consumed enough xlog space that a checkpoint is needed.
  *
  * new_segno indicates a log file that has just been filled up (or read
  * during recovery). We measure the distance from RedoRecPtr to new_segno
  * and see if that exceeds CheckPointSegments.
  *
  * Note: it is caller's responsibility that RedoRecPtr is up-to-date.
  */
 static bool
 XLogCheckpointNeeded(XLogSegNo new_segno)
 {
     XLogSegNo   old_segno;

     XLByteToSeg(RedoRecPtr, old_segno, wal_segment_size);

     if (new_segno >= old_segno + (uint64) (CheckPointSegments - 1))
         return true;
     return false;
 }

and:

 /*
  * Calculate CheckPointSegments based on max_wal_size_mb and
  * checkpoint_completion_target.
  */
 static void
 CalculateCheckpointSegments(void)
 {
     double      target;

     /*-------
      * Calculate the distance at which to trigger a checkpoint, to avoid
      * exceeding max_wal_size_mb. This is based on two assumptions:
      *
      * a) we keep WAL for only one checkpoint cycle (prior to PG11 we kept
      *    WAL for two checkpoint cycles to allow us to recover from the
      *    secondary checkpoint if the first checkpoint failed, though we
      *    only did this on the master anyway, not on standby. Keeping just
      *    one checkpoint simplifies processing and reduces disk space in
      *    many smaller databases.)
      * b) during checkpoint, we consume checkpoint_completion_target *
      *    number of segments consumed between checkpoints.
      *-------
      */
     target = (double) ConvertToXSegs(max_wal_size_mb, wal_segment_size) /
         (1.0 + CheckPointCompletionTarget);

     /* round down */
     CheckPointSegments = (int) target;

     if (CheckPointSegments < 1)
         CheckPointSegments = 1;
 }

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange