Question

I was under the belief that ReferenceField in MongoEngine also creates an index for this field. As MongoEngine inherits a lot from the Django ORM style, and it creates an index for its ForeignKeys.. I was expecting this to happen here as well.

Example, I have these two simple document definitions:

import mongoengine as me

class Group(me.Document):
    name = me.StringField()
    meta = {'collection': 'groups'}

class Item(me.Document):
    name = me.StringField()
    group = me.ReferenceField(Group)

And if I lookup the indexes inside mongodb CLI, there's no index for the ReferenceField:

> db.item.getIndexes()
[
    {
        "v" : 1,
        "key" : {
            "_id" : 1
        },
        "ns" : "me_tests.item",
        "name" : "_id_"
    }
]
> 

Is there any reason not to do this?

I was having problem with a production server with ~60000 items, it took ~234 secs to do a lookup for all item groups.. but when I indexed the ReferenceField.. I got that number down to ~2s.. So I guess the performance argument is quite clear.

Was it helpful?

Solution

There are no joins in MongoDB and as such a ReferenceField is just an ordinary field that happens to store an ObjectId.

Indexes should be created with thought and planning - there is a cost to having one as well as to not having one. So "Whats the best index for a schema?" Well that really only depends on one thing - usage.

How are you using you data and how are you querying for that data? That should drive the design of your indexes, not what type of data you are storing*

So for the best performance its best to tune your queries (like you have done) - using the built in profiling is a good start.

* As ever there is an exception that proves the rule - geo data :)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top