Question

I manage a MongoDB replica set for one of my customer and sometimes he send me a ticket (email) with only one argument :

~ WARNING Date/Time: ~Tue Jul 4 02:39:33 GMT 2017 
Summary: ~WARNING - Queries / Sec: 9.674419

How understand this kind of message? I could guess that we have some slow query running. So the first thing I do it's to check the log. What i found in it's that we have some aggregate command that take 500-600ms to execute. Here an example :

2017-07-04T04:08:52.205Z I COMMAND  [conn2104435] 
command databaseName.cards_PWCD900004 
command: aggregate { aggregate: "cards_PWCD900004", 
pipeline: [ { $unwind: "$virtualCards" }, 
{ $project: { cardProfileId: 1, virtualCards.state: 1, 
virtualCards.device: 1, 
virtualCards.primaryDpanId: { $ifNull: [ "$virtualCards.primaryDpanId", null ] } } }, 
{ $group: { _id: { cardProductId: "$cardProfileId", 
virtualCardsState: "$virtualCards.state", 
deviceId: "$virtualCards.device" }, tokensNumber: { $sum: 1 }, 
virtualCardsNumber: { $sum: { $cond: { if: { $eq: [ "$virtualCards.primaryDpanId", null ] }, 
then: 1, else: 0 } } } } }, { $group: { _id: { issuerName: "APLUS", 
cardProductId: "$_id.cardProductId", 
virtualCardsState: "$_id.virtualCardsState" }, 
devicesNumber: { $sum: 1 }, virtualCardsNumber: { $sum: "$virtualCardsNumber" }, 
tokensNumber: { $sum: "$tokensNumber" } } } ] } 
keyUpdates:0 writeConflicts:0 numYields:219 reslen:30763 
locks:{ Global: { acquireCount: { r: 460 } }, 
Database: { acquireCount: { r: 230 } }, 
Collection: { acquireCount: { r: 230 } } } protocol:op_query 523ms

What I would like to know is how troubleshoot such thing? I could launch a mongotop or mongostat but if the issue just finished I will see nothing. I also could use db.serverStatus() but I don't really know which variables is the most significant.

database size : 2GB
SSL : true

About mongo.conf all it's by default. Here more information about hardware:

RAM : 8GB
2 proc : 2,3 GHz each

What are best practices an administrator should have to check if all works on his database ?

If you have any question don't hesitate please.

Was it helpful?

Solution

Lets first look at few of the output terms to understand better.

numYields

The number of times the operation yielded to allow other operations to complete. Typically, operations yield when they need access to data that MongoDB has not yet fully read into memory. This allows other operations that have data in memory to complete while MongoDB reads in data for the yielding operation. Does a read or write operation ever yield the lock?

reslen

The length in bytes of the operation’s result document. A large responseLength can affect performance. To limit the size of the result document for a query operation, you can use any of the following:

  • Projections
  • The limit() method
  • The batchSize() method

acquireCount

Number of times the operation acquired the lock in the specified mode.

How do you troubleshoot this thing?

I am taking a guess here as I cannot see your query plan. I suggest using explain method and looking at the plan. Especially focusing on index usage, scan/seek.

See this answer about high number of yields.

Monitor your instance for some key matrix which will help you tune your long running queries. In your case I would especially focus on index misses.

  • mongotop - shows how much time was spend reading or writing each collection over the last second
  • mongostat - brilliant live debug tool, gives a view on all your connected MongoDB instances

OTHER TIPS

What I did was to run db.serverStatus() and find the most significant variables, in your words.

> db.serverStatus( { metrics: 0, locks: 0, tcmalloc: 0, opcounters: 1 } )

I found opcounters to be the clarifying key.

My server was getting overloaded with a fresh started instance and got this result: (this results are 2 minutes after boot in a dev env)

"opcounters" : {                                                               
  "insert" : 20,                                                 
  "query" : 14761124,                                            
  "update" : 1727,                                                                                                        
  "delete" : 2,                                                  
  "getmore" : 0,                                                 
  "command" : 24528                                                                                                       
}

This is an app with low traffic and virtually no use, it seems odd that there are over 14 mill. queries in little time.

That let me diagnose that some module had a while(true) loop querying the database and deal with it.

The reason I got to this question, was because I was monitoring my server (htop) and some module had 60% of the processor busy and mongod had another 30%, constantly. With that data I searched among the code and found that a config file was incorrect, making the default value for the delay between queries to fallback to 0. That 0 is virtually instantly querying every data in the database, infinitely. Hence overloading the db and server.

The thing is that you can troubleshoot mongo by reading into the variables and find the one that fits best.

For your req/s problem, you could write a script that queries the server every second, summing up the opcounters insert, query, update and delete and just counting. Check this answer on how to do that.

Reading through the variables I couldn't find one that debugs slow requests.

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top