Question

I have started exploring mongodb couple of weeks back. I have a scenario here. I have a collection which has 3 million records.

I would want to perform aggregation on the aggreation based on two keys (also need to use match condition). I used aggregation framework for the same. I came to know that aggregation would fail if the processing document size (array) exceeds 16 MB.

I faced the same issue when i tried. I am trying to use map reduce now. I would need the guidance on implementing the same. How can I overcome the 16 MB size limit by using map reduce?

Also I came to know that I can do it by splitting the collection into multiple collections and do the aggregation on the same. Would be great if anyone can point me in right direction?

Was it helpful?

Solution

Even without code there are basic answers to your questions.

The limitation on the BSON document 16MB output size is for "inline" responses. That means a response from your operations that does not write the individual "documents" from your response to a collection.

So with mapReduce a statement much like this:

db.collection.mapReduce(
    mapper,
    reducer,
    { "out": { "inline": 1 } }
)

Has the problem that the "array" in the response needs to be under 16MB. But if you change this to output to a collection:

db.collection.mapReduce(
    mapper,
    reducer,
    { "out": { "replace": "newcollection" } }
)

Then you no longer have this limitation.

The same applies to the aggregate method from versions 2.6 and upwards using the $out pipeline stage:

db.collection.aggregate([
   // lots of pipeline

   { "$out": "newcollection }

])

This overcomes the limtation by the same means by outputing to a collection.

Actually with the aggregate statement, again from version 2.6 and upwards this returns a cursor, just like the .find() method, and is also not subject to this limitation.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top