Question

I have a sharded input collection that I want to filter on before sending it to my hadoop cluster for map reduce computations.

I have this parameter in my $ hadoop jar - command

mongo.input.query='{_id.uuid:"device-964693"}'

and it works. The output does not mapreduce any data that does not satisfy this query.

This however does not work:

mongo.input.query='{_id.day:{\\$lt:{\\$date:1388620740000}}}'

no data is being produced as output.

1388620740000 represents the date Wed Jan 01 2014 23:59:00 GMT+0000 (GMT). The setup is using hadoop 2.2, mongo 2.4.9, this connector version (2.2-1.2.0).

No error messages, just a standard hadoop success message.

Is my syntax incorrect or what did I miss?

Could you point me to some debugging tools/methods for this?

Was it helpful?

Solution

Debugging methods:

in mongo:

db.currentOp(true).inprog.forEach(
   function(d){
     if(d.ns == "test.collection" && d.query.query["_id.day"] )
        printjson(d);
     })

a terminal-friendly syntax:

$ hadoop jar... ...mongo.input.query='{"_id.day":{"$lt":{"$date":"2014-01-19T23:00:00Z"}}}'
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top