Domanda

I have two shards of 3 replica machines each (same specs)

The chunks are reasonably well-distributed:

Shard events at events/xxx:27018,yyy:27018
 data : 6.82GiB docs : 532402 chunks : 59
 estimated data per chunk : 118.42MiB
 estimated docs per chunk : 9023

Shard events2 at events2/zzz:27018,qqq:27018
 data : 7.3GiB docs : 618783 chunks : 66
 estimated data per chunk : 113.31MiB
 estimated docs per chunk : 9375

Totals
 data : 14.12GiB docs : 1151185 chunks : 125
 Shard events contains 48.29% data, 46.24% docs in cluster, avg obj size on shard : 13KiB
 Shard events2 contains 51.7% data, 53.75% docs in cluster, avg obj size on shard : 12KiB

However, the primary on one side has almost 4x the vmsize and lock % close to 90% (vs 2% on the other) as well as a much higher btree count. This results in high number of cursors timing out on that machine.

Both shards should get similar types of queries, and opcounter values are pretty close.

sane host

problem host

How can I diagnose this?

UPDATE The underperforming side appears to be using a humongous amount of storage for the data, including 100x the space for the index:

    "ns" : "site_events.listen",
    "count" : 544213,
    "size" : 7500665112,
    "avgObjSize" : 13782.59084586366,
    "storageSize" : 9698657792,
    "numExtents" : 34,
    "nindexes" : 3,
    "lastExtentSize" : 1788297216,
    "paddingFactor" : 1.0009999991378065,
    "systemFlags" : 1,
    "userFlags" : 1,
    "totalIndexSize" : 4630807488,
    "indexSizes" : {
            "_id_" : 26845184,
            "uid_1" : 26664960,
            "list.i_1" : 4577297344
    },

vs

    "ns" : "site_events.listen",
    "count" : 621962,
    "size" : 7891599264,
    "avgObjSize" : 12688.233789202555,
    "storageSize" : 9305386992,
    "numExtents" : 24,
    "nindexes" : 2,
    "lastExtentSize" : 2146426864,
    "paddingFactor" : 1.0000000000917226,
    "systemFlags" : 1,
    "userFlags" : 1,
    "totalIndexSize" : 45368624,
    "indexSizes" : {
            "_id_" : 22173312,
            "uid_1" : 23195312
    },
È stato utile?

Soluzione

Based on updated stats, it seems clear that there exists an index on one shard for your partitioned collection that's not there on the other shard. This can happen when an index is built on the replica set in rotating fashion but someone forgets to build it on both shards, or when it was not supposed to be there, but was not dropped from all replica sets.

In your case, the extra index, "list.i_1" is 4.2GB in size and most certainly would contribute significantly to performance difference.

The rest of my comments are more general and some may not be specific to your example.

Generally speaking, it is not uncommon that users start with one shard (or unsharded replica set) and then add a second shard to take on half of the load.

Unfortunately, the way that data gets migrated to shard2 leaves shard1 with fragmented storage, both for data and indexes. Since MongoDB uses memory mapped files, the larger files end up using more RAM, causing more stress on the I/O subsystem, and in general are less performant than more compact shard2, which got all of its data basically "at once" and is able to store similar number of documents using less space.

What you can do to get shard1 back with the program is to compact the affected collection(s) or even repairDatabase() if there are multiple sharded collections. The latter will return the freed space to the OS, but even though compact does not return the space to OS, it will keep it in the free list to be used as you insert more data, but the existing data will be nicely co-located in minimum possible amount of space.

Notice that in the same replica set, even though one of your primaries is much larger than the other one, the secondary is significantly smaller. This would happen if the secondary "re-synced" all its data at one time much later than when the balancing between shards happened. In this case, you can step down your primary and let the more compact secondary take over - it should perform better, and meanwhile you can compact or repair the former primary (newly secondary). Generally, three replica nodes is recommended so that you are not running without a safety net while doing this sort of maintenance.

Another observation I will make is that even though there is a more-or-less evenly sharded collection on both shards, you have a number of additional collections that live on the primary shard for this database, which is the larger shard. The difference in index sizes is certainly due to the extra indexes for extra collections that exist on shard1 and not shard2. That's normal for database where only some collections are sharded.

There is not much that can be done about that imbalance except for:

  1. shard the larger of the unsharded collections or
  2. move half the unsharded collections into a different database which will have shard2 as its primary shard. This will split unsharded collections between two shards more "evenly".
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top