Question

I just started working with mongodb and I am having troubles using mapReduce function. For some reason it seems to not be calling the map and reduce functions.

Here is my code:

@getMonthlyReports: (req, res) ->
    app_id = req.app.id
    start = moment().subtract('years', 1).startOf('month').unix()
    end = moment().endOf('day').unix()
    console.log(start)
    console.log(end)

    map = ->
        geotriggers = 0
        pushes = 0
        console.log("ok")
        date = moment(@timestamp).startOf('month').unix()
        for campaign in @campaigns
            if campaign.geotriggers?
                geotriggers += campaign.geotriggers
            else if campaign.pushes?
                pushes += campaign.pushes

        emit date,
            geotriggers: geotriggers
            pushes: pushes

    reduce = (key, values) ->
        console.log("ok")
        geotriggers = 0
        pushes = 0
        for value in values
            geotriggers += value.geotriggers
            pushes += value.pushes
        geotriggers: geotriggers
        pushes: pushes


    common.db.collection(req.app.id + "_daily_campaign_reports").mapReduce map, reduce,
        query:
            timestamp:
                $gte: start
                $lt: end

        out:
            inline: 1
    , (err, results) ->
        console.log(results)


        ResponseHelper.returnMessage req, res, 200, results

I put some console.logs and it seems the map and reduce functions are not being called. Also my results is undefined.

Is there something I am missing?

Was it helpful?

Solution

Apart from how I have already commented on the reason your mapReduce is failing is due to calling a library function that does not exist on your server (moment.js), this is not really a good usage of mapReduce.

While mapReduce has it's uses, a simple aggregation case like this is better suited to the aggregation framework as it is a native C++ implementation as opposed to mapReduce which runs inside a JavaScript interpreter. As a result the processing is much faster.

All you need are your existing unix timestamp values for start and end as well as the current day of the month ( dayOfMonth ) in order to do the date math:

db.collection.aggregate([
    // Match documents using your existing start and end values
    { "$match": {
        "timestamp": { "$gte": start, "$lt": end }
    }},

    // Unwind campaigns array
    { "$unwind": "$campaigns" },

    // Group on the start of month value
    { "$group": {
        "_id": { 
            "$subtract": [
               "$timestamp",
               { "$mod": [ "$timestamp", 1000 * 60 * 60 * 24 * dayOfMonth ] }
            ]
        },
        "geotriggers": { 
            "$sum": {
                "$cond": [
                   "$campaigns.geotriggers",
                   1,
                   0
                ]
            }
        },
        "pushes": { 
            "$sum": {
                "$cond": [
                   "$campaigns.pushes",
                   1,
                   0
                ]
            }
        },
    }}
])

If I am reading your code correctly you have have each document containing an array for "campaigns", so to deal with this in the aggregation framework you use the $unwind pipeline stage to expose each array member as it's own document.

The date math is done in the $group stage for the _id key by changing the "timestamp" value to be equal to the starting date of the month which is the same thing that your code is trying to do. It's debatable that you could just use null here as your range selection is only going to result in a singular date value, but this is just to show that the date math is possible.

With the "unwound" array elements, we process every element just like the "for loop" does and conditionally adds the values for "geotriggers" and "pushes" using the $cond operator. Again this presumes as by your code these fields evaluate to boolean true/false which is the evaluation part of $cond

Your query condition is of course just met with the $match stage at the start of the pipeline, using the same range query.

That basically does the same thing without relying on additional libraries in server side processing and does it much faster as well.

See the other Aggregation Framework operators for reference.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top