Question

I'm trying to understand how to best work with indices in MongoDB. Lets say that I have a collection of documents like this one:

{
  _id:        1,
  keywords:   ["gap", "casual", "shorts", "oatmeal"],
  age:        21,
  brand:     "Gap",
  color:     "Black",
  gender:    "female",     
  retailer:  "Gap",
  style:     "Casual Shorts",
  student:    false,
  location:  "US",
}

and I regularly run a query to find all documents that match a set of criteria for each of those fields, something like:

db.items.find({ age:      { $gt: 13, $lt: 40 },
                brand:    { $in: ['Gap', 'Target'] },
                retailer: { $in: ['Gap', 'Target'] },
                gender:   { $in: ['male', 'female'] },
                style:    { $in: ['Casual Shorts', 'Jeans']},
                location: { $in: ['US', 'International'] },
                color:    { $in: ['Black', 'Green'] },
                keywords: { $all: ['gap', 'casual'] }
              })

I'm trying to figure what sort of index I can create to improve the speed of a query such as this. Should I create a compound index like this:

db.items.ensureIndex({ age: 1, brand: 1, retailer: 1, gender: 1, style: 1, location: 1, color: 1, keywords: 1})

or is there a better set of indices I can create to optimize this query?

Was it helpful?

Solution

Should I create a compound index like this:

db.items.ensureIndex({age: 1, brand: 1, retailer: 1, gender: 1, style: 1, location: 1, color: 1, keywords: 1})

You can create an index like the one above, but you're indexing almost the entire collection. Indexes take space; the more fields in the index, the more space is used. Usually RAM, although they can be swapped out. They also incur write penalty.

Your index seems wasteful, since probably indexing just a few of those fields will make MongoDB scan a set of documents that is close to the expected result of the find operation.

Is there a better set of indices I can create to optimize this query?

Like I said before, probably yes. But this question is very difficult to answer without knowing details of the collection, like the amount of documents it has, which values each field can have, how those values are distributed in the collection (50% gender male, 50% gender female?), how they correlate to each other, etc.

There are a few indexing strategies, but normally you should strive to create indexes with high selectivity. Choose "small" field combinations that will help MongoDB locate the desired documents scanning a "reasonable" amount of them. Again, "small" and "reasonable" will depend on the characteristics of the collection and query you are performing.

Since this is a fairly complex subject, here are some references that should help you building more appropriate indexes.

http://emptysqua.re/blog/optimizing-mongodb-compound-indexes/ http://docs.mongodb.org/manual/faq/indexes/#how-do-you-determine-what-fields-to-index http://docs.mongodb.org/manual/tutorial/create-queries-that-ensure-selectivity/

And use cursor.explain to evaluate your indexes.

http://docs.mongodb.org/manual/reference/method/cursor.explain/

OTHER TIPS

Large index like this one will penalize you on writes. It is better to index just what you need, and let Mongo's optimiser do most of the work for you. You can always give him an hint or, in last resort, reindex if you application or data usage changes drastically.

Your query will use the index for fields that have one (fast), and use a table scan (slow) on the remaining documents.

Depending on your application, a few stand alone indexes could be better. Adding more indexes will not improve performance. With the write penality, it could even make it worse (YMMV).

Here is a basic algorithm for selecting fields to put in an index:

  • What single field is in a query the most often?
  • If that single field is present in a query, will a table scan be expensive?
  • What other field could you index to further reduce the table scan?

This index seems to be very reasonable for your query. MongoDB calls the query a covered query for this index, since there is no need to access the documents. All data can be fetched from the index.

from the docs:

"Because the index “covers” the query, MongoDB can both match the query conditions and return the results using only the index; MongoDB does not need to look at the documents, only the index, to fulfill the query. An index can also cover an aggregation pipeline operation on unsharded collections."

Some remarks:

  • This index will only be used by queries that include a filter on age. A query that only filters by brand or retailer will probably not use this index.

  • Adding an index on only one or two of the most selective fields of your query will already bring a very significant performance boost. The more fields you add the larger the index size will be on disk.

  • You may want to generate some random sample data and measure the performance of this with different indexes or sets of indexes. This is obviously the safest way to know.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top