Question

I am on a project that uses the old morphia 0.99 in a Maven Spring MVC application, with no realistic option to upgrade to the latest supported versions.

I am attempting to search a very large (63 million document) collection to validate identifiers that a user has uploaded via a CSV document. The goal is to associate a given id with an agnt_id. The documents are something like:

{
  "_id" : ObjectId("53430e789f0b37b71976d2f1"),
  "agnt_id" : 1234,
  "a" : 793,
  "b" : "D2QT",
  "c" : "85B56",
  "d" : 119,
  "e" : "comm",
  "f" : "2C27",
  "g" : "DE00Z29UU3",
  "h" : "",
  "i" : "DE00M83DH1",
  "j" : 13211
}

I am currently using Query<IdentifierVO> query = ds.createQuery(IdentifierVO.class).field(propertyName).in(identifiers). retrievedFields(true, propertyName, "agnt_id");

to return results, but this often results in over 100,000 documents being returned, which is a major network cost and much more inefficient than I'm comfortable with.

I am attempting to execute this following command, which returns me exactly what I need without all the extra documents, but I am unable to make this work:

db.runCommand({
  group: {
    ns: 'id_search',
    key: {
        agnt_id: 1,
        g: 1
    },
    cond: {
        g: {
            $in: [
                32008,
                11989
            ]
        }
    },
    $reduce: function(curr,
    result){

    },
    initial: {

    }
   }
})

But my two attempts fail with errors: (rawQuery = the command text above)

Object result = ds.getDB().authenticate("mongousr", password); //password is char[]. Auth returns true

Object result2 = ds.getDB().eval(rawQuery);

results in :

22:55:20,215 ERROR [org.apache.catalina.core.ContainerBase.[jboss.web].[default-host].[/clientportal].[appServlet]] (http--0.0.0.0-8080-5) Servlet.service() for servlet appServlet threw exception: com.mongodb.CommandFailureException: { "serverUsed" : "redacted" , "ok" : 0.0 , "errmsg" : "unauthorized"}

Is there anyway to get mongodb to run the command I want, or is there another way to group the data to reduce the dataset being transferred?

Was it helpful?

Solution

You really should not be using this method, it is actually just a wrapper around mapReduce and therefore runs arbitrary JavaScript through an interpreter which is not very efficient.

What you need is the aggregate() method. A basic shell example:

 db.collection.aggregate([
     { "$match": {
         "g": "$in": [ 32008, 11989 ]
     }},
    { "$group": {
        "_id": {
             "agnt_id": "$agnt_id",
             "g": "$g"
         }
    }}
 ])

You should have access to the actual aggregate command with resorting to using the "runCommand" method. It should be under standard "read" methods in terms of access.

Or actually from morphia you should be able to work of a Datastore instance in order to get the raw collection as implemented in the driver. So something like:

    // get collection
    DBCollection collection = ds.getCollection("collection");

    // Arguments
    BasicDBList myargs = new BasicDBList();
    myargs.add(new Integer(32008));
    myargs.add(new Integer(11989));

    // $match phase
    BasicDBObject match = new BasicDBObject("$match",
        new BasicDBObject("g",
            new BasicDBObject("$in", myargs)
        )
    );

    // $group phase
    BasicDBObject group = new BasicDBObject("$group",
        new BasicDBObject("_id",
            new BasicDBObject("agnt_id","$agnt_id")
                .append("g", "$g")
        )
    );

    // pipeline
    BasicDBList pipeline = new BasicDBList();
    pipeline.add(match);
    pipeline.add(group);

    // Aggregate
    AggregationOutput output =  collection.aggregate(pipeline);

There are some other builder options available that makes the process nicer if you are fine with mixing dependencies, but this gives the basic idea.

OTHER TIPS

I found the solution I needed:

First of all "eval" is the wrong method to use. This runs javascript on the server, and is not what I needed for my purposes. I instead constructed a DBObject and ran the command this way. The working code is below, where identifiers is a Set, and propertyName is a string

    DBObject group = new BasicDBObject();
    DBObject groupparams = new BasicDBObject();
    DBObject key = new BasicDBObject();
    DBObject cond = new BasicDBObject();
    DBObject in = new BasicDBObject();

    groupparams.put("ns", "port_identifiers");
    key.put("agnt_id", 1);
    key.put(propertyName, 1);
    groupparams.put("key", key);

    in.put("$in", identifiers);
    cond.put(propertyName, in);
    groupparams.put("cond", cond);

    groupparams.put("$reduce", "function(curr, result) {}");
    groupparams.put("initial", new BasicDBObject());

    group.put("group", groupparams);

    CommandResult result = ds.getDB().command(group);

produces something like::

{ "group" : { "ns" : "id_search" , "key" : { "agnt_id" : 1 , "z" : 1} , "cond" : { "acid" : { "$in" : [ 7943 , 11330]}} , "$reduce" : "function(curr, result) {}" , "initial" : { }}}

and returns:

{ "serverUsed" : "redacted" , "retval" : [ { "agnt_id" : 111464.0 , "z" : 7943.0} , { "agnt_id" : 111466.0 , "z" : 11330.0}] , "count" : 112034.0 , "keys" : 2 , "ok" : 1.0}

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top