Question

I have a sharded collection where the shard key is a field called "uuid". This field's value is of type string and represents hexadecimal values i.e a hexadecimal string. For each document this "uuid" field is unique.

The data is divided into chunks automatically by MongoDB. I cannot figure out how MongoDB is dividing this hexadecimal string into contiguous ranges. There are no documents that explain how Mongo forms these ranges

Can you please help me to understand how these ranges are formed?

For a sample, I have inserted 3025357 documents with the said hexadecimal values. The chunks and the ranges associated with them are,

{    
    "_id" : "database.sha_shard-uuid_MinKey",
    "lastmod" : Timestamp(2, 0),
    "lastmodEpoch" : ObjectId("5e08bad0b5e6b931087f0871"),
    "ns" : "database.sha_shard",
    "min" : {
        "uuid" : { "$minKey" : 1 }
    },
    "max" : {
        "uuid" : "000043c071f23fc889275f77f950c649faac92e0"
    },
    "shard" : "shardRpSet2",
    "history" : [ 
        {
            "validAfter" : Timestamp(1577632842, 37),
            "shard" : "shardRpSet2"
        }
    ]
},{
    "_id" : "database.sha_shard-uuid_\"5b935a89d91977490d04f740a86bccc2b3cc2bfb\"",
    "lastmod" : Timestamp(3, 5),
    "lastmodEpoch" : ObjectId("5e08bad0b5e6b931087f0871"),
    "ns" : "database.sha_shard",
    "min" : {
        "uuid" : "5b935a89d91977490d04f740a86bccc2b3cc2bfb"
    },
    "max" : {
        "uuid" : "7a25fa7aa3a86ed259f646d7890db370e8b43ae7"
    },
    "shard" : "shardRpSet1",
    "history" : [ 
        {
            "validAfter" : Timestamp(1577632856, 21509),
            "shard" : "shardRpSet1"
        }
    ]
},{
    "_id" : "database.sha_shard-uuid_\"7a25fa7aa3a86ed259f646d7890db370e8b43ae7\"",
    "lastmod" : Timestamp(3, 6),
    "lastmodEpoch" : ObjectId("5e08bad0b5e6b931087f0871"),
    "ns" : "database.sha_shard",
    "min" : {
        "uuid" : "7a25fa7aa3a86ed259f646d7890db370e8b43ae7"
    },
    "max" : {
        "uuid" : "810b573464d4894fc40b428ec82ec54d9a681bf6"
    },
    "shard" : "shardRpSet1",
    "history" : [ 
        {
            "validAfter" : Timestamp(1577632856, 21509),
            "shard" : "shardRpSet1"
        }
    ]
},{
    "_id" : "database.sha_shard-uuid_\"000043c071f23fc889275f77f950c649faac92e0\"",
    "lastmod" : Timestamp(4, 0),
    "lastmodEpoch" : ObjectId("5e08bad0b5e6b931087f0871"),
    "ns" : "database.sha_shard",
    "min" : {
        "uuid" : "000043c071f23fc889275f77f950c649faac92e0"
    },
    "max" : {
        "uuid" : "1e8421c5d4f3eb45a82c2785bccc81fa7abfbfc7"
    },
    "shard" : "shardRpSet2",
    "history" : [ 
        {
            "validAfter" : Timestamp(1577635896, 15268),
            "shard" : "shardRpSet2"
        }
    ]
},{
    "_id" : "database.sha_shard-uuid_\"1e8421c5d4f3eb45a82c2785bccc81fa7abfbfc7\"",
    "lastmod" : Timestamp(5, 0),
    "lastmodEpoch" : ObjectId("5e08bad0b5e6b931087f0871"),
    "ns" : "database.sha_shard",
    "min" : {
        "uuid" : "1e8421c5d4f3eb45a82c2785bccc81fa7abfbfc7"
    },
    "max" : {
        "uuid" : "3d165990d2969bbaf79b6b0d790080b46ca5f056"
    },
    "shard" : "shardRpSet",
    "history" : [ 
        {
            "validAfter" : Timestamp(1577635906, 26457),
            "shard" : "shardRpSet"
        }
    ]
},{
    "_id" : "database.sha_shard-uuid_\"3d165990d2969bbaf79b6b0d790080b46ca5f056\"",
    "lastmod" : Timestamp(5, 1),
    "lastmodEpoch" : ObjectId("5e08bad0b5e6b931087f0871"),
    "ns" : "database.sha_shard",
    "min" : {
        "uuid" : "3d165990d2969bbaf79b6b0d790080b46ca5f056"
    },
    "max" : {
        "uuid" : "5b935a89d91977490d04f740a86bccc2b3cc2bfb"
    },
    "shard" : "shardRpSet1",
    "history" : [ 
        {
            "validAfter" : Timestamp(1577632856, 21509),
            "shard" : "shardRpSet1"
        }
    ]
},{
    "_id" : "database.sha_shard-uuid_\"c1788722a31a5a5a5caa00816ad85aeeda26e581\"",
    "lastmod" : Timestamp(5, 2),
    "lastmodEpoch" : ObjectId("5e08bad0b5e6b931087f0871"),
    "ns" : "database.sha_shard",
    "min" : {
        "uuid" : "c1788722a31a5a5a5caa00816ad85aeeda26e581"
    },
    "max" : {
        "uuid" : "dcbd245e03d425aa14a85b51befde274856fc5f3"
    },
    "shard" : "shardRpSet",
    "history" : [ 
        {
            "validAfter" : Timestamp(1577630416, 3),
            "shard" : "shardRpSet"
        }
    ]
},{
    "_id" : "database.sha_shard-uuid_\"dcbd245e03d425aa14a85b51befde274856fc5f3\"",
    "lastmod" : Timestamp(5, 3),
    "lastmodEpoch" : ObjectId("5e08bad0b5e6b931087f0871"),
    "ns" : "database.sha_shard",
    "min" : {
        "uuid" : "dcbd245e03d425aa14a85b51befde274856fc5f3"
    },
    "max" : {
        "uuid" : "fffff8c5e160711fb48f0d38ce01a98880e869e2"
    },
    "shard" : "shardRpSet",
    "history" : [ 
        {
            "validAfter" : Timestamp(1577630416, 3),
            "shard" : "shardRpSet"
        }
    ]
},{
    "_id" : "database.sha_shard-uuid_\"fffff8c5e160711fb48f0d38ce01a98880e869e2\"",
    "lastmod" : Timestamp(6, 0),
    "lastmodEpoch" : ObjectId("5e08bad0b5e6b931087f0871"),
    "ns" : "database.sha_shard",
    "min" : {
        "uuid" : "fffff8c5e160711fb48f0d38ce01a98880e869e2"
    },
    "max" : {
        "uuid" : { "$maxKey" : 1 }
    },
    "shard" : "shardRpSet2",
    "history" : [ 
        {
            "validAfter" : Timestamp(1577636268, 67),
            "shard" : "shardRpSet2"
        }
    ]
},{
    "_id" : "database.sha_shard-uuid_\"810b573464d4894fc40b428ec82ec54d9a681bf6\"",
    "lastmod" : Timestamp(6, 1),
    "lastmodEpoch" : ObjectId("5e08bad0b5e6b931087f0871"),
    "ns" : "database.sha_shard",
    "min" : {
        "uuid" : "810b573464d4894fc40b428ec82ec54d9a681bf6"
    },
    "max" : {
        "uuid" : "c1788722a31a5a5a5caa00816ad85aeeda26e581"
    },
    "shard" : "shardRpSet",
    "history" : [ 
        {
            "validAfter" : Timestamp(1577630416, 3),
            "shard" : "shardRpSet"
        }
    ]
}
Was it helpful?

Solution

Reference on how shard chunks work: https://docs.mongodb.com/v4.0/core/sharding-data-partitioning/

MongoDB uses the shard key associated to the collection to partition the data into chunks. A chunk consists of a subset of sharded data. Each chunk has a inclusive lower and exclusive upper range based on the shard key.

The mongos routes writes to the appropriate chunk based on the shard key value. MongoDB splits chunks when they grow beyond the configured chunk size. Both inserts and updates can trigger a chunk split.

Now, to understand what records would be going inside a chunk, we need to understand the section "Each chunk has a inclusive lower and exclusive upper range based on the shard key", and from now on, we should call this the chunk range.

For example, this chunk:

{    
    "_id" : "database.sha_shard-uuid_MinKey",
    "lastmod" : Timestamp(2, 0),
    "lastmodEpoch" : ObjectId("5e08bad0b5e6b931087f0871"),
    "ns" : "database.sha_shard",
    "min" : {
        "uuid" : { "$minKey" : 1 }
    },
    "max" : {
        "uuid" : "000043c071f23fc889275f77f950c649faac92e0"
    },
    "shard" : "shardRpSet2",
    "history" : [ 
        {
            "validAfter" : Timestamp(1577632842, 37),
            "shard" : "shardRpSet2"
        }
    ]
}

The fields min and max are the chunk range:

 "min" : {
     "uuid" : { "$minKey" : 1 }
 },
 "max" : {
     "uuid" : "000043c071f23fc889275f77f950c649faac92e0"
 },

This range defines what goes inside the chunk, you can understand how the range works reading the BSON reference: https://docs.mongodb.com/v4.0/reference/bson-type-comparison-order/

In your case, if the UUID field only contains strings, this is how the record will be evaluated as being inside the chunk range:

Strings Binary Comparison

By default, MongoDB uses the simple binary comparison to compare strings.

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top