How do I perform the SQL Join equivalent in PyMongo? Or more specific call a Pymongo collection object in BSON code?

StackOverflow https://stackoverflow.com/questions/18897138

Question

I'm trying to perform a SQL Join equivalent in pymongo like so: http://blog.knoldus.com/2013/02/03/joins-now-possible-in-mongodb/

The thing is, I am stuck as bson cannot encode collection objects.

bson.errors.InvalidDocument: Cannot encode object: Collection(Database(MongoClient('localhost', 27017), 'Application'), 'Products')

All relevant code:

class Delivery(Repository):

    COLLECTION_ATTRIBUTE = 'deliveriesCollection'

    def __init__(self):
        Repository.__init__(self, self.COLLECTION_ATTRIBUTE)

    def printTable(self):
        from bson.code import Code

        mapper = Code('function() {'
                    '    product = ProductCollection.findOne({_id:this.Product_ID});'
                    '    data = {'
                    '        \'Name\':this.Name,'
                    '        \'Product_ID\': product.ID'
                    '    };'  
                    '    emit(this._id, data );'
                    '}', ProductCollection = product.collection)

        reducer = Code('function(key, values) {'
                    '    return values[0];'
                    '}')

        result = self.collection.map_reduce(mapper, reducer, "myresults")

        for doc in result.find():
            print(doc)

delivery = Delivery()
product = Product()
Was it helpful?

Solution

While I'm not sure of your schema, you can't pass a local Python reference to a MongoDB collection to the Code object and have it be available on the Database server. You could pass a scope object, but here, that's not what you need. Ultimately, the code is just Javascript, so it needs to access the Products collection locally/natively.

However, I was just reminded, that as of 2.4+, it's no longer possible to access other collections/databases/shards from a MapReduce. So, you can't access other documents, either in the same collection or a different collection or from either the map or reduce functions.

There are a number of suggestions that you'll find on the internet if you look for joins, map reduce, mongodb, etc. They'll do multiple step map reduces into the same collections. It's not simple, nor necessarily efficient.

From your code, it looks like you're just trying to do a quick lookup of Product names. There are a number of ways you might be able to optimize that without needing a join. I'd suggest caching the names locally, and when that doesn't work, using the $in operator to gather a set of products, with a projection to limit the results to the minimum fields you need (such as name), and cache those results ... and then do a "local" join on the client in Python (in which you grab the name value and output it as a "virtual" like property if needed with your Delivery class, or somewhere else downstream as it's consumed by a client).

As MongoDB intentionally does not support Joins, it's usually best to consider whether your collection and document structures are best designed for the patterns you need.

You could also just create the map reduce function in the MongoDB console.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top