MongoDB - Replication Oplog Window has gone below 1 hours

https://dba.stackexchange.com/questions/267442

02-03-2021
|

Question

I'm working with MongoDB Atlas and I have a cluster of 3 with M30 and 100gb of storage.

My current usecase is the following: - An user dispatches one search in my platform - The platform dispatches this search to other providers (12) - I get like 2k documents per provider for this search

For the configuration, I'm working with Mongodb Atlas in a 3 replica-set cluster (M30 = 8 GB RAM • 100 GB storage). I also have an TTL index to expire the documents in the only database/collection I'm using to delete the documents after 10 minutes based on a date field in each document (searchStartedAt).

What I was trying to understand is this alert I'm getting ("Replication Oplog Window has gone below 1 hours"), if I got this right is how long the master node can continue receiving data before the slave nodes get out of sync.

I would like to check if my understanding is right and if there is any tunning possible to avoid this.

Can someone give me a hint? If any other information can help regarding this, please let me know and I'll update the question.

Thanks a lot for any tips regarding it.

Solution

If replication OpLog window goes under one hour, it means that timestamp between first and last lines in the OpLog is under one hour. This can happen when there is lots of changes in the DB and your size of OpLog is "too small". No, this is not fatal. It just means that your secondary nodes cannot "fallback" more than what is your OpLog window. I mean that if your window is f.ex. 55 minutes, you cannot stop your secondary node for over 55 minutes, because it cannot "catch-up" anymore and it needs to do "full sync" and if that full sync takes more than 55 minutes, it cannot be done.

What you can do is change OpLog size to bigger. If you double your OpLog size, you get double size window.

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange