MongoDB Primary Member of Replica Set High Memory Usage

https://dba.stackexchange.com/questions/194157

10-10-2020
|

Question

I currently have a sharded cluster up and running, and I have noticed that the memory usage of the primary member of each replica set is very high, it is 8GB although I started each member with the following:

mongod --auth --bind_ip_all --shardsvr --replSet a --smallfiles --oplogSize 1024 --keyFile /file/somewhere/

I thought (possibly naively) that the oplogSize would limit the amount of memory used.

Any guidance in how to solve this or highlighting the error of my ways is much appreciated.

Solution

Introduction

The oplog has ~~nothing~~ very little to do with memory consumption. It is a capped collection used as sort of a write-ahead log for operations to be replicated to other replica set members.

In general, MongoDB uses up to around 85% (give and take) of memory unless told otherwise. This memory is used to keep the indices and the "working set" (copies of the most recently used documents) in RAM to ensure optimal performance. While it is technically possible to limit the RAM used by MongoDB, it is a Very Bad Idea™ to do so, as you severely limit MongoDB's performance and make it basically impossible to detect when to scale out or up because of insufficient RAM.

TL;DR: If you have to ask how to limit the RAM utilised by MongoDB, you probably should not limit it, as you are unable to judge the side effects this step will introduce.

Limiting MongoDBs memory consumption

You basically have three options: Limit the cache size for the WiredTigers storage engine, use cgroups to limit the memory mongod can request from the OS or use Docker to do so (which makes it a bit easier, but under the hood Docker uses cgroups as well, iirc).

Option 1: Limit WiredTigers cache size

Add the following option to your configuration file (I assume it is in YAML format):

storage:
 wiredTiger:
  engineConfig:
     cacheSizeGB: <number>

where <number> is the maximum amount of RAM MongoDB is allowed to use for WiredTiger's cache. Note that fiddling with this parameter can severly impact performance (on the other hand, limiting MongoDB's memory consumption always will). Please also note that this does not limit the memory used by mongod itself (for example, each connection gets a small stack assigned).

Option2: Using cgroups to limit the overall memory consumption of `mongod`

As a root user, first ensure that cgroups are enabled:

$ lscgroup
cpuset:/
cpu:/
cpuacct:/
memory:/
devices:/
freezer:/
net_cls:/
blkio:/

Assuming cgroups are available, you can now configure a control group for MongoDB's memory consumption in /etc/cgconfig.conf:

group mongodb{
    memory {
        memory.limit_in_bytes = 512m;
    }
}

After you have done so, you need to restart the cgconfig service. Do not simply copy and paste the config above: With 512 MB, MongoDB will bearly run (if at all). Adjust the memory limit to your needs, with at least 2GB of RAM.

Next, you need to assign mongod to the control group you just created. To do so, you need to edit /etc/cgrules.conf:

*:mongod memory mongodb/

where * denotes that this rule applies regardless who started mongod, the limit will be applied to RAM according to the rules of the control group mongod/. As a last step, you now need to restart the cgred and MongoDB services. The mongod now should use only the specified amount of RAM, for the better or worse.

Option 3: Use Docker to limit `mongod`'s overall memory consumption:

Identify which version of MongoDB you are running currently

$ mongod -version
db version v3.4.10
git version: 078f28920cb24de0dd479b5ea6c66c644f6326e9
OpenSSL version: OpenSSL 1.0.2n  7 Dec 2017
allocator: system
modules: none
build environment:
   distarch: x86_64
   target_arch: x86_64

Your output may be different, but we only need the db version, namely the minor version. In this example, it is "3.4".

Pull a suitable docker image
```
$ docker pull mongo:3.4
```
You should pull the docker image for the version you determined earlier and use the pulled image in the next step.
Run the docker image with the appropriate parameters
```
 $ docker run -d --name mongod --memory=512m \
 > --mount type=bind,src=/path/to/your/datafiles,dst=/data/db \
 > --mount type=bind,src=/file/somewhere/,dst=/key.file,readonly
 > mongod <yourOptions>
```
A few things to note here: The first mount makes your existing datafiles accessible from inside the container, while the second mount does the same for your keyfile. You need to adjust your mongod options accordingly, namely the keyFile option to point to the destination you mounted your keyfile to. See the docker documentation and the README of the mongo docker image for details.

Conclusion

You have a sharded cluster and you want to limit the memory consumption of the individual shard members.

In case we are talking of a production system, this is a Very Bad Idea™: Either you have other services running on the machine running your mongods (which would make the two services compete for resources in case of heavy load) or artificially limit the performance MongoDB will provide (by using the methods described above). This is bad systems design. Why did you shard the first place? Sharding is MongoDB's method of load balancing and scaling out (in case a limiting factor, say RAM, can not be scaled up any more because the bang you get for the buck is insufficient). I have a mantra which I repeat to customers (and to be honest occasionally to myself):

MongoDB instances bearing production data should run on dedicated machines. No exceptions!

Depending on the reasons you sharded the first place and how many shards you have, it may well be that you didn't even need to shard if you ran your cluster on dedicated machines. Do the math.

And even if it was a good idea to have your cluster nodes running other services, they are obviously under provisioned. Given the price of RAM compared with reduced performance, it is basically a no-brainer to scale your machines up with a decent amount of RAM rather than limiting it artificially to enforce a system design which is bad in the first place.

My advice for you is to not follow any of the above approaches. Instead, run your data bearing MongoDB instances on dedicated machines. Scale them up as long as you get an according bang for the buck RAM and IO-wise (CPU is rarely an issue) before you shard. As of the time of this writing that would be between 128 and 256GB RAM and a RAID 0 (in case you have a replica set, which you do have, don't you?) or RAID 10 ( in case your shards are not replica sets - shame on you ;) ) with SSDs. Shard only if

you have too many IOPS for a single machine to handle
you need more RAM than you could fit into your replica set members with a good bang for the buck
you have more data than a single machine can persist.

hth

PS Do not blame it on me or MongoDB if your performance goes south after you limited the RAM for the mongods.

OTHER TIPS

As per MongoDB BOL Here When you start a replica set member for the first time, MongoDB creates an oplog of a default size.

For Unix and Windows systems

The default oplog size depends on the storage engine:

Storage Engine              Default Oplog Size    Lower Bound   Upper Bound
In-Memory Storage Engine    5% of physical memory   50 MB       50 GB
WiredTiger Storage Engine   5% of free disk space   990 MB      50 GB
MMAPv1 Storage Engine        5% of free disk space  990 MB      50 GB

For 64-bit macOS systems

The default oplog size is 192 MB of either physical memory or free disk space depending on the storage engine:

Storage Engine              Default Oplog Size
In-Memory Storage Engine    192 MB of physical memory
WiredTiger Storage Engine   192 MB of free disk space
MMAPv1 Storage Engine       192 MB of free disk space

In most cases, the default oplog size is sufficient. For example, if an oplog is 5% of free disk space and fills up in 24 hours of operations, then secondaries can stop copying entries from the oplog for up to 24 hours without becoming too stale to continue replicating. However, most replica sets have much lower operation volumes, and their oplogs can hold much higher numbers of operations.

Note: New in version 3.6.

This procedure changes the size of the oplog on each member of a replica set using the replSetResizeOplog command, starting with the secondary members before proceeding to the primary.

Important : You can only run replSetResizeOplog on replica set member’s running with the Wired Tiger storage engine.

Connect to the replica set member

Connect to the replica set member using the mongo shell:

mongo --host <hostname>:<port>

Note: If the replica set enforces authentication, you must authenticate as a user with privileges to modify the local database, such as the clusterManager or clusterAdmin role.

Verify the current size of the oplog

use local
db.oplog.rs.stats().maxSize

The maxSize field displays the collection size in bytes.

Change the oplog size of the replica set member

To change the size, run the replSetResizeOplog passing the desired size in megabytes as the size parameter. The specified size must be greater than 990, or 990 megabytes.

The following operation changes the oplog size of the replica set member to 16 gigabytes, or 16000 megabytes.

db.adminCommand({replSetResizeOplog: 1, size: 16000})

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange