I have a 5 million entries in a Mongo DB that look like this:
{
"_id" : ObjectId("525facace4b0c1f5e78753ea"),
"productId" : null,
"name" : "example name",
"time" : ISODate("2013-10-17T09:23:56.131Z"),
"type" : "hover",
"url" : "www.example.com",
"userAgent" : "curl/7.24.0 (x86_64-apple-darwin12.0) libcurl/7.24.0 openssl/0.9.8r zlib/1.2.5"
}
I need to add to every entry a new field called device
which will have either the value desktop
or mobile
. That means, the goal would be to have the following kind of entries:
{
"_id" : ObjectId("525facace4b0c1f5e78753ea"),
"productId" : null,
"device" : "desktop",
"name" : "example name",
"time" : ISODate("2013-10-17T09:23:56.131Z"),
"type" : "hover",
"url" : "www.example.com",
"userAgent" : "curl/7.24.0 (x86_64-apple-darwin12.0) libcurl/7.24.0 openssl/0.9.8r zlib/1.2.5"
}
I am working with the MongoDB Java driver and so far I am doing the following:
DBObject query = new BasicDBObject();
query.put("device", new BasicDBObject("$exists", false)); //some entries already have such field
DBCursor cursor = resource.find(query);
cursor.addOption(Bytes.QUERYOPTION_NOTIMEOUT);
Iterator<DBObject> iterator = cursor.iterator();
int size = cursor.count();
And then I am iterating with a while(iterator.hasNext())
, doing an if-else with a huge regular expression I found out there, and depending of the result of such if-else I execute something like:
BasicDBObject newDocument = new BasicDBObject("$set", new BasicDBObject().append("device", "desktop")); //of "mobile", depending on the if-else
BasicDBObject searchQuery = new BasicDBObject("_id", id);
resource.getCollection(DatabaseConfiguration.WEBSITE_STATISTICS).update(searchQuery, newDocument);
However, due to the big amount of data (more than 5 million entries) this takes forever.
Is there a way of doing this with map reduce? So far I've only used MapReduce for counting, so I am not sure if it can be used for other matters.