Question

I have a huge data set (in millions) in the following format :

{
  "userid" : "codejammer",
  "data" : [       
   {"type" : "number", "value" : "23748"},
   {"type" : "message","value" : "one"}
  ]
}

I want to get count of message with value one for userid - codejammer

The following is the mapreduce function I am using : Map :

var map = function(){
   emit(this.data[0].value,1);
}

Reduce

var reduce = function(key,values){
    return Array.sum(values);
}

Options

var options = {
             "query":{"userid" : "codejammer",
                 "data.type" : "message"},
             "out" : "aggregrated"
            }

The mapreduce function executes successfully with the following output:

{
 "_id" : 23748,
  "value" : 1
}

But, I am expecting the following output :

{
 "_id" : one,
 "value" : 1
}

The query filter in options, is sending the entire array to map function even though I specifically ask for data.type : "message"

Is there any way to use projection operator in query filter to get only the required item in array ?

Thank you very much for your help.

Was it helpful?

Solution

You actually would be better off doing this with aggregate. There is no need for mapReduce in this case and the aggregation framework runs as native code and will be much faster than running through the JavaScript interpreter:

db.collection.aggregate([
    // Still makes sense to match the documents to reduce the set
    { "$match": {
        "userid": "codejammer",
        "data": { "$elemMatch": { 
            "type": "message", "value": "one" 
        }}
    }},

    // Unwind to de-normalize the array content
    { "$unwind": "$data" },

    // Filter the content of the array
    { "$match": {
        "data.type": "message",
        "data.value": "one"
    }},

    // Count all the matching entries
    { "$group": {
        "_id": null,
        "count": { "$sum": 1 }
    }}
])

Of course if you actually did only ever have one "message" inside your "data" array this becomes very simple:

db.collection.aggregate([
    // Match the documents you want
    { "$match": {
        "userid": "codejammer",
        "data": { "$elemMatch": { 
            "type": "message", "value": "one" 
        }}
    }},

    // Simply count the documents
    { "$group": {
        "_id": null,
        "count": { "$sum": 1 }
    }}
])

But of course that is actually no different to this:

db.collection.find({
    "userid": "codejammer",
    "data": { "$elemMatch": { 
        "type": "message", "value": "one" 
    }}
}).count()

So while there is a way to do this with mapReduce, the other ways shown are much better. Especially in the newly released 2.6 version and upwards. In the newer versions the aggregation pipeline can make use of disk storage to handle very large collections.

But to get the count using mapReduce you were basically going about it the wrong way. The projection will not work as an input, so you need to take the element out of the results. I'm still going to consider that there could possibly be more than one matching value in your array even if that was not the case:

db.collection.mapReduce(
    function() {
        var userid = this.userid;
        this.data.forEach(function(doc) {
            if ( doc == condition )
                emit( userid, 1 ); 
        });
    },
    function(key,values) {
        return values.length;
    },
    {
        "query": { 
            "userid": "codejammer",
            "data": { "$elemMatch": { 
                "type": "message", "value": "one" 
            }}
        },
        "scope": {
           "condition": {"type" : "message", "value" : "one"}
        },
        "out": { "inline": 1 }
    }
)

So in much the same way this "emits" a value for the common key when a document matching your criteria is found inside the data array. So you cannot project just the matching element, you get all of them and you filter in this way.

Since you are only expecting one result there is no point in actually outputting to a collection, so just send it out as one.

But basically, use the aggregation method if you have to do this.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top