Question

I want to convert my MongoDB 3.6 standalone server to a replica set so I can watch a collection using change streams, which requires an oplog. I only want to use change streams so will keep this as a single member replica set.

The problem is that my standalone instance is kind of big (~400 GB) and I do not have a backup of it.

I have a few questions:

  1. Is such conversion considered a safe operation? Are there any caveats considering the time of "conversion", the additional memory needed etc. or does that only apply if I connect more nodes?
  2. What are the ways of backing up such big instances? I have heard of and read about Filesystem Snapshots but I'm not sure what are they and how to handle them.
Was it helpful?

Solution

Is such conversion considered a safe operation?

Converting a standalone node to a replica set is a straightforward procedure, but if you do not have a backup of your deployment (and this data is important) I would definitely prioritise creating and testing a backup. If you only have a single copy of your data (the one being used!) you will have very limited (and possibly painful) recovery options if something unwelcome accidentally (or intentionally) happens to your data.

Are there any caveats considering the time of "conversion", the additional memory needed etc. or does that only apply if I connect more nodes?

A replica set member will use some extra storage space and I/O for the operation log (oplog) which stores a rolling record of operations for use cases like replication and change streams. The conversion process is essentially restarting mongod with a new replSet option and then initialising the oplog and replica set configuration using rs.initate().

What are the ways of backing up such big instances? I have heard of and read about Filesystem Snapshots but I'm not sure what are they and how to handle them.

The MongoDB documentation describes supported Backup Methods including Back Up and Restore with Filesystem Snapshots. Filesystem snapshots use system level tools that vary depending on the O/S and filesystem used for your deployment.

For example, Linux has LVM (Logical Volume Manager) which enables taking a consistent backup of a block device. The initial snapshot will have some more noticeable overhead, but subsequent snapshots are generally quick. However, snapshots typically depend on the same storage infrastructure as the original disk so it is essential that you have a plan for archiving snapshots and saving backups elsewhere. If you are using a cloud provider (Amazon, Google Cloud, Azure, ...) with network data volumes, these also have snapshot APIs.

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top