Build index against counter fields

https://stackoverflow.com/questions/9047128

03-12-2019
|

سؤال

For a field that is functioning as counter, ie, value will be changed over the time, and will be used to return ordered entities (will sort against this field for the filtered entities), should we build index for this field?

المحلول

It's not entirely clear, but I think the question is about whether the cons of creating an index on a frequently updated field will outweigh the benefits of fast querying and sorting on that field. You also imply that your query will filter on a different field, and then you want to sort on this field. Feel free to elaborate on your exact use case.

What I think you want is something like this:

db.test.save({filter: "stuff", count: "1"});
db.test.save({filter: "stuff", count: "3"});
db.test.save({filter: "stuff", count: "2"});
db.test.save({filter: "notstuff", count: "2"});
db.test.save({filter: "notstuff", count: "2"});

And then an index like so:

db.test.ensureIndex({filter:1, count:1});

And then a query like so:

db.test.find({filter:"stuff"}).sort({count:1});
{ "_id" : ObjectId("4f24353eef88b8b53a20fdf5"), "filter" : "stuff", "count" : "1" }
{ "_id" : ObjectId("4f24353eef88b8b53a20fdf7"), "filter" : "stuff", "count" : "2" }
{ "_id" : ObjectId("4f24353eef88b8b53a20fdf6"), "filter" : "stuff", "count" : "3" }

Which uses the btree:

db.test.find({filter:"stuff"}).sort({count:1}).explain();
{
"cursor" : "BtreeCursor filter_1_count_1",
"nscanned" : 3,
"nscannedObjects" : 3,
...

Now, it really might depend on how many results you need to get back. If it's only a few results, you could probably sort on the field without an index, and that would improve the update performance. I think I'll actually do a few tests since I'm curious. I'll update in a bit.

update I wrote this benchmark to show the difference between sorting on an index and not, and updating a count field on an index, and not. Full code here: https://gist.github.com/1696041

It inserts 700K and 7M docs (to get some variety), separated into 7 "filters". Then it randomly picks a doc to increment the count of 1M times. The 1M docs per filter are too big to sort without a limit, so the only way to show how that piece works is to put a limit in.

The conclusion is as expected. It takes longer (almost twice as long in this test--but twice as long is still pretty fast) to update the count field when there's an index on it. But it's much faster to query against. You have to decide which is more important to you.

The output is here (running on my macbook pro w/ SSD):

> bench();
benchmarking with index on {filter,data}, 700K docs  
initialInsert of 700000 done in: 58304ms, 0.08329142857142857ms per insert
updateCounts 1000000 times done in: 103915ms, 0.103915ms per update
explain find({filter:"abcd"}).sort({count:-1}): 
   cursor: BtreeCursor filter_1_data_1
   nscanned: 100000
   scanAndOrder: true
   millis: 1235
explain find({filter:"abcd"}).limit(100).sort({count:-1}): 
   cursor: BtreeCursor filter_1_data_1
   nscanned: 100000
   scanAndOrder: true
   millis: 614
benchmarking with index on {filter,data} and {filter, count}, 700k docs
initialInsert of 700000 done in: 72108ms, 0.10301142857142857ms per insert
updateCounts 1000000 times done in: 202778ms, 0.202778ms per update
explain find({filter:"abcd"}).sort({count:-1}): 
   cursor: BtreeCursor filter_1_count_-1
   nscanned: 100000
   scanAndOrder: undefined
   millis: 139
explain find({filter:"abcd"}).limit(100).sort({count:-1}): 
   cursor: BtreeCursor filter_1_count_-1
   nscanned: 100
   scanAndOrder: undefined
   millis: 0
benchmarking with index on {filter,data}, 7M docs
initialInsert of 7000000 done in: 616701ms, 0.08810014285714286ms per insert
updateCounts 1000000 times done in: 134655ms, 0.134655ms per update
explain find({filter:"abcd"}).sort({count:-1}): 
***too big to sort without limit!***
explain find({filter:"abcd"}).limit(100).sort({count:-1}): 
   cursor: BtreeCursor filter_1_data_1
   nscanned: 1000000
   scanAndOrder: true
   millis: 6396
benchmarking with index on {filter,data} and {filter, count}, 7M docs
initialInsert of 7000000 done in: 891556ms, 0.12736514285714284ms per insert
updateCounts 1000000 times done in: 280885ms, 0.280885ms per update
explain find({filter:"abcd"}).sort({count:-1}): 
   cursor: BtreeCursor filter_1_count_-1
   nscanned: 1000000
   scanAndOrder: undefined
   millis: 1337
explain find({filter:"abcd"}).limit(100).sort({count:-1}): 
   cursor: BtreeCursor filter_1_count_-1
   nscanned: 100
   scanAndOrder: undefined
   millis: 0

نصائح أخرى

Strange question. Indexes are used for efficient queries. If you query on a field and you are likely interested created an index. explain() tells you about the execution plan. This is all covered in depth by the MongoDB documentation...so why do you ask such a very basic question?

مرخصة بموجب: CC-BY-SA مع الإسناد

لا تنتمي إلى StackOverflow