Why does MongoDB 4.0 require journaling for replica set members using WiredTiger?

https://dba.stackexchange.com/questions/257021

21-02-2021
|

Question

This is a very interesting change in version 4.0 since checkpoints seem to serve a very similar role to journaling.

The checkpoint ensures that the data files are consistent up to and including the last checkpoint; i.e. checkpoints can act as recovery points.

OK so checkpoints give me a durable view of data at that last checkpoint. I lose any data after the last checkpoint in the instance of a crash.

The WiredTiger journal persists all data modifications between checkpoints. If MongoDB exits between checkpoints, it uses the journal to replay all data modified since the last checkpoint.

Which is a little bit of extra durability seeing as the Journal tracks all operations. Its like a list of what has been done that we can reapply after failure.

The journal record includes any internal write operations caused by the initial write. For example, an update to a document in a collection may result in modifications to the indexes; WiredTiger creates a single journal record that includes both the update operation and its associated index modifications.

Seems like the point of checkpoints was to move fast, but at the risk of losing a minute of data at a time. Whereas the journal is all about accounting for everything. So why have both? Seems like having the Journal kind of overrides any gains that Checkpoints were supposed to give.

Solution

Seems like the point of checkpoints was to move fast, but at the risk of losing a minute of data at a time. Whereas the journal is all about accounting for everything. So why have both? Seems like having the Journal kind of overrides any gains that Checkpoints were supposed to give.

Although journal and checkpoint processes both write data to disk, they serve different purposes. Journal writes are fast append-only writes to smaller journal files (up to 100MB), whereas a checkpoint persists changes to the data files (which has more overhead in terms of files and complexity of reconciling updates).

The journal requirement for MongoDB 4.0+ replica set members using WiredTiger is motivated by improvements to replication performance and rollback behaviour:

In MongoDB 3.2+, a w:majority write concern implies j:true (acknowledged writes are confirmed by a majority of data-bearing members and will not be rolled back) if j is not specified. If the journal is not enabled, the alternative is to either acknowledge in-memory writes for j:true (which could potentially lead to rollback) or trigger a checkpoint (which will force syncing changes to data files on every write). Requiring journaling is a stronger guarantee that acknowledged majority writes will not be rolled back, which is important for applications or server features like Change Streams that return data based on majority read concerns.
A WiredTiger performance improvement in MongoDB 4.0+ reduces the write amplification of some data changes that were previously journaled multiple times (data written to the oplog and to the target collection) and allows faster rollback without limitations on data size (prior to MongoDB 4.0 the rollback limit was 300 MB of data before manual intervention was required). However, the new Recovery to a Timestamp (RTT) rollback algorithm relies on the journal to recover the oplog to a consistent point. For more technical details see the Startup Recovery and Rollback descriptions in the MongoDB source repo on GitHub.

If you do not have a replica set configuration it is still possible (but not recommended) to run with the journal disabled in MongoDB 4.0+.

If you attempt to start a MongoDB 4.0+ replica set member using WiredTiger with the journal disabled, mongod will fail to complete the startup sequence and shutdown with a log message like:

Running wiredTiger without journaling in a replica set is not supported. Make sure
you are not using --nojournal and that storage.journal.enabled is not set to 'false'.

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange