MongoDB Primary Member of Replica Set High Memory Usage
-
10-10-2020 - |
Question
I currently have a sharded cluster up and running, and I have noticed that the memory usage of the primary member of each replica set is very high, it is 8GB although I started each member with the following:
mongod --auth --bind_ip_all --shardsvr --replSet a --smallfiles --oplogSize 1024 --keyFile /file/somewhere/
I thought (possibly naively) that the oplogSize
would limit the amount of memory used.
Any guidance in how to solve this or highlighting the error of my ways is much appreciated.
Solution
Introduction
The oplog has nothing very little to do with memory consumption. It is a capped collection used as sort of a write-ahead log for operations to be replicated to other replica set members.
In general, MongoDB uses up to around 85% (give and take) of memory unless told otherwise. This memory is used to keep the indices and the "working set" (copies of the most recently used documents) in RAM to ensure optimal performance. While it is technically possible to limit the RAM used by MongoDB, it is a Very Bad Idea™ to do so, as you severely limit MongoDB's performance and make it basically impossible to detect when to scale out or up because of insufficient RAM.
TL;DR: If you have to ask how to limit the RAM utilised by MongoDB, you probably should not limit it, as you are unable to judge the side effects this step will introduce.
Limiting MongoDBs memory consumption
You basically have three options: Limit the cache size for the WiredTigers storage engine, use cgroups to limit the memory mongod
can request from the OS or use Docker to do so (which makes it a bit easier, but under the hood Docker uses cgroups as well, iirc).
Option 1: Limit WiredTigers cache size
Add the following option to your configuration file (I assume it is in YAML format):
storage:
wiredTiger:
engineConfig:
cacheSizeGB: <number>
where <number>
is the maximum amount of RAM MongoDB is allowed to use for WiredTiger's cache. Note that fiddling with this parameter can severly impact performance (on the other hand, limiting MongoDB's memory consumption always will). Please also note that this does not limit the memory used by mongod
itself (for example, each connection gets a small stack assigned).
Option2: Using cgroups to limit the overall memory consumption of mongod
As a root user, first ensure that cgroups are enabled:
$ lscgroup
cpuset:/
cpu:/
cpuacct:/
memory:/
devices:/
freezer:/
net_cls:/
blkio:/
Assuming cgroups are available, you can now configure a control group for MongoDB's memory consumption in /etc/cgconfig.conf
:
group mongodb{
memory {
memory.limit_in_bytes = 512m;
}
}
After you have done so, you need to restart the cgconfig service. Do not simply copy and paste the config above: With 512 MB, MongoDB will bearly run (if at all). Adjust the memory limit to your needs, with at least 2GB of RAM.
Next, you need to assign mongod
to the control group you just created. To do so, you need to edit /etc/cgrules.conf
:
*:mongod memory mongodb/
where *
denotes that this rule applies regardless who started mongod
, the limit will be applied to RAM according to the rules of the control group mongod/
. As a last step, you now need to restart the cgred
and MongoDB services. The mongod
now should use only the specified amount of RAM, for the better or worse.
Option 3: Use Docker to limit mongod
's overall memory consumption:
Identify which version of MongoDB you are running currently
$ mongod -version db version v3.4.10 git version: 078f28920cb24de0dd479b5ea6c66c644f6326e9 OpenSSL version: OpenSSL 1.0.2n 7 Dec 2017 allocator: system modules: none build environment: distarch: x86_64 target_arch: x86_64
Your output may be different, but we only need the
db version
, namely the minor version. In this example, it is "3.4".Pull a suitable docker image
$ docker pull mongo:3.4
You should pull the docker image for the version you determined earlier and use the pulled image in the next step.
Run the docker image with the appropriate parameters
$ docker run -d --name mongod --memory=512m \ > --mount type=bind,src=/path/to/your/datafiles,dst=/data/db \ > --mount type=bind,src=/file/somewhere/,dst=/key.file,readonly > mongod <yourOptions>
A few things to note here: The first
mount
makes your existing datafiles accessible from inside the container, while the secondmount
does the same for your keyfile. You need to adjust your mongod options accordingly, namely thekeyFile
option to point to the destination you mounted your keyfile to. See the docker documentation and the README of the mongo docker image for details.
Conclusion
You have a sharded cluster and you want to limit the memory consumption of the individual shard members.
In case we are talking of a production system, this is a Very Bad Idea™: Either you have other services running on the machine running your mongod
s (which would make the two services compete for resources in case of heavy load) or artificially limit the performance MongoDB will provide (by using the methods described above). This is bad systems design. Why did you shard the first place? Sharding is MongoDB's method of load balancing and scaling out (in case a limiting factor, say RAM, can not be scaled up any more because the bang you get for the buck is insufficient). I have a mantra which I repeat to customers (and to be honest occasionally to myself):
MongoDB instances bearing production data should run on dedicated machines. No exceptions!
Depending on the reasons you sharded the first place and how many shards you have, it may well be that you didn't even need to shard if you ran your cluster on dedicated machines. Do the math.
And even if it was a good idea to have your cluster nodes running other services, they are obviously under provisioned. Given the price of RAM compared with reduced performance, it is basically a no-brainer to scale your machines up with a decent amount of RAM rather than limiting it artificially to enforce a system design which is bad in the first place.
My advice for you is to not follow any of the above approaches. Instead, run your data bearing MongoDB instances on dedicated machines. Scale them up as long as you get an according bang for the buck RAM and IO-wise (CPU is rarely an issue) before you shard. As of the time of this writing that would be between 128 and 256GB RAM and a RAID 0 (in case you have a replica set, which you do have, don't you?) or RAID 10 ( in case your shards are not replica sets - shame on you ;) ) with SSDs. Shard only if
- you have too many IOPS for a single machine to handle
- you need more RAM than you could fit into your replica set members with a good bang for the buck
- you have more data than a single machine can persist.
hth
PS Do not blame it on me or MongoDB if your performance goes south after you limited the RAM for the mongod
s.
OTHER TIPS
As per MongoDB BOL Here When you start a replica set member for the first time, MongoDB creates an oplog of a default size.
For Unix and Windows systems
The default oplog size
depends on the storage engine:
Storage Engine Default Oplog Size Lower Bound Upper Bound
In-Memory Storage Engine 5% of physical memory 50 MB 50 GB
WiredTiger Storage Engine 5% of free disk space 990 MB 50 GB
MMAPv1 Storage Engine 5% of free disk space 990 MB 50 GB
For 64-bit macOS systems
The default oplog
size is 192 MB
of either physical memory or free disk space depending on the storage engine:
Storage Engine Default Oplog Size
In-Memory Storage Engine 192 MB of physical memory
WiredTiger Storage Engine 192 MB of free disk space
MMAPv1 Storage Engine 192 MB of free disk space
In most cases, the default oplog
size is sufficient. For example, if an oplog
is 5% of free disk space and fills up in 24 hours of operations
, then secondaries can stop copying entries from the oplog for up to 24 hours without becoming too stale to continue replicating. However, most replica sets have much lower operation volumes, and their oplogs
can hold much higher numbers of operations.
Note: New in version 3.6.
This procedure changes the size of the oplog on each member of a replica set using the replSetResizeOplog command, starting with the secondary members before proceeding to the primary.
Important : You can only run
replSetResizeOplog
on replica set member’s running with the Wired Tiger storage engine.
Connect to the replica set member
Connect to the replica set member using the mongo shell:
mongo --host <hostname>:<port>
Note: If the replica set enforces authentication, you must authenticate as a user with privileges to modify the local database, such as the clusterManager or clusterAdmin role.
Verify the current size of the oplog
use local
db.oplog.rs.stats().maxSize
The maxSize field displays the collection size in bytes.
Change the oplog size of the replica set member
To change the size, run the replSetResizeOplog passing the desired size in megabytes as the size parameter. The specified size must be greater than 990, or 990 megabytes.
The following operation changes the oplog size of the replica set member to 16 gigabytes, or 16000 megabytes.
db.adminCommand({replSetResizeOplog: 1, size: 16000})