Question

I am looking to manipulate some records in my mongo db before I pass them on to an aggregate function. In particular, I need to sum up some properties of the collection before I perform a sum on these properties.

The summing of the properties cannot initially be done in an aggregation query because the property names vary in the original collection. For example, I am starting with something like:

{ timestamp: 1346774400000, foo3: 12, foo45: 13, foo9: 2 }, 
{ timestamp: 1346796000000, foo7: 33, foo2: 5 }

I need to modify each document to sum up the values for each property beginning with "foo", then sum all these values for each document in the collection.

I wrote a map operation to do so, which would produce something like:

{ timestamp: 1346774400000, foo_total: 27 }, 
{ timestamp: 1346796000000, foo_total: 38 } 

...but I cannot perform an aggregate function on the output of db.collection.map().

Is there any way to accomplish this or alternatively a better method to do so? I am not able to change the existing structure of the documents and I would like to avoid doing a map reduce operation and I do not want to offload this operation into code.

Was it helpful?

Solution

As stated, the problem with the differing key values in your documents is that aggregate cannot specifically work on these, at least without knowing all of the possible values and writing what would be a very long statement.

Of course your present approach is processing the collection results after retrieving them and does not actually result in a collection itself, so there is no way to possibly pass this to aggregation.

So the best approach it to pass the whole thing off to mapReduce, and the logic is fairly simple. First a mapper:

var mapper = function () {

  var patt = /^([a-z|A-Z]+)/;

  var total = {};

  for ( n in this ) {

      if ( (n == "timestamp") || n == "_id" )
        continue;

      var match = patt.exec(n)[0];
      if (!total.hasOwnProperty(match))
        total[match] = 0;

      total[match] += this[n];

  }

  emit( null, total );

};

So very simply this is just going to "interrogate" the field names while exluding any that you know you do not need. In this case, using a regex to match the first "alpha" characters in the field name. I'm allowing the possibility that fields could be "foo16, "bar32", "baz12" and none of this would matter to the operation. At any rate, some method for stripping out the part of the field that you want.

These values are added internally per document and sent through to the reducer, as there is only one "key", being null.

So in the reducer:

var reducer = function (key,values) {

  var reduced = {};

  values.forEach(function(value) {
    for ( var n in value ) {
      if ( !reduced.hasOwnProperty(n) )
        reduced[n] = 0;

      reduced[n] += value[n];
    }
  });

  return reduced;

};

This similarly cycles each document that was emitted and sums the results for each "field" found in order to produce the result:

{
    "results" : [
            {
                    "_id" : null,
                    "value" : {
                            "foo" : 65
                    }
            }
    ],
    "timeMillis" : 7,
    "counts" : {
            "input" : 2,
            "emit" : 2,
            "reduce" : 1,
            "output" : 1
    },
    "ok" : 1,
}

Just based on the sample documents that you have.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top