Question

I have some collections with identical schema and I want to perform a merge + aggregation on them. The schemas are simple and look like this:

{ 'fr': 1, 'to': 1, 'wg': 213}
{ 'fr': 1, 'to': 2, 'wg': 53}
{ 'fr': 2, 'to': 2, 'wg': 5521}

The following code works for merging two collections, but I am wondering if there is a faster solutions and/or one that could merge multiple collections in a similar way without creating nested calls:

var c = db.collection('first').find()

c.each(function(err, doc) {
    if (err) throw err

    if (doc == null) {
        console.log('done')
        return
    }
    db.collection('second').findOne({
        'fr': doc['fr'],
        'to': doc['to']
    }, function(err, doc2) {
        if (err) throw err

        db.collection('my_results').save({
            'fr': doc['fr'],
            'to': doc['to'],
            'wg': doc['wg'] + doc2['wg']
        }, function(err) {
            if (err) throw err
        })
    })
})
Was it helpful?

Solution

There are no absolute free operations here since you cannot do joins with MongoDB. But you can get the output you want using mapReduce and some of its features.

So first create a mapper:

var mapper = function () {

  emit( { fr: this.fr, to: this.to }, this.wg )

};

And then a reducer:

var reducer = function (key,values) {

  return Array.sum( values );

};

Then you run the mapReduce operation with the output set to a different collection:

db.first.mapReduce(mapper,reducer,{ "out": { "reduce": "third" } })

Note the "out" options there which are explained in this manual section. The point is, despite possibly misleading statistics output in the console, that "reduce" statement is very important. This is so when we run the same code against the other collection:

db.second.mapReduce(mapper,reducer,{ "out": { "reduce": "third" } })

What actually happens in the result, is the output from the first operation is also passed into the "reduce" phase of the second operation.

The end result is that all the values from both collections with the same key values will be added together in the "third" collection:

{ "_id" : { "fr" : 1, "to" : 1 }, "value" : 426 }
{ "_id" : { "fr" : 1, "to" : 2 }, "value" : 106 }
{ "_id" : { "fr" : 2, "to" : 2 }, "value" : 11042 }

You can make that a little fancier if you wanted your fr and to to be the unique combination of two possibles in either order, or even run another mapReduce or aggregate over those results.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top