Question

I'm new to mongo and i'm looking for a clean way to sort and remove with a single command:

{u'house_id': 199, u'_id': ObjectId('50906d7fa3c412bb040eb896'), u'type': u'house', u'rate': 58.09608083191365}
{u'house_id': 199, u'_id': ObjectId('50906d7fa3c412bb040eb895'), u'type': u'house', u'rate': 49.34223066136407}
{u'house_id': 198, u'_id': ObjectId('50906d7fa3c412bb040eb891'), u'type': u'house', u'rate': 76.18366499496366}
{u'house_id': 198, u'_id': ObjectId('50906d7fa3c412bb040eb892'), u'type': u'house', u'rate': 17.46279901047208}

How to remove documents which have the lowest rate with the same house_id ?

Was it helpful?

Solution

Unfortunately the remove and update commands do not yet allow generic cursor methods within them ( https://jira.mongodb.org/browse/SERVER-1599 ) so the best way currently is to do a find and then a remove:

var houses = db.collection.find({house_id: 199}).sort({rate: 1});
houses.forEach(function(doc){
    db.collection.remove({_id: house._id});
    return;
})

That is currently the best way.

OTHER TIPS

While the basic answer here is that you need to loop results you are probably going to do better by getting all of the "minimum value" documents in one hit. The aggregation framework is useful for this as you can combine the $first operator with $sort:

var result  = db.collection.aggregate([
    { "$sort": { 
        "house_id": 1,
        "rate": 1
    }},
    { "$group": {
        "_id": "$house_id",
        "docId": { "$first": "$_id" },
        "count": { "$sum": 1 }
    }},
    { "$match": {
        "count": { "$gt": 1 }
    }}
])

That gives results containing all of the documents that have the lowest rate across the collection and of course discards any results that only had one value for your "house_id" as you would not want to remove that.

Then if you can actually get away with it, you can just apply all of those results to the $in operator with a little mapping to just extract the _id values that you would need:

var ids = [];
result.result.forEach(function(doc) {
    ids.push( doc.docId );
});

db.collection.remove({ "_id": { "$in": ids } })

Also noting there that the default form of .remove() will act on all documents that are matched unless there is an optional operator specified to remove only one. But this is okay for the purposes.

From MongoDB 2.6 you get access to a "cursor" returned with aggregate results, so you get options to improve this over large result sets:

var ids = [];
var cursor = db.collection.aggregate([
    { "$sort": { 
        "house_id": 1,
        "rate": 1
    }},
    { "$group": {
        "_id": "$house_id",
        "docId": { "$first": "$_id" },
        "count": { "$sum": 1 }
    }},
    { "$match": {
        "count": { "$gt": 1 }
    }}
]);

cursor.forEach(function(doc) {
    ids.push( doc.docId );

    if ( ids.length % 500 == 0 ) {
        db.collection.remove({ "_id": { "$in": ids } });
        ids = [];
    }

});

if ( ids.length > 0 )
    db.collection.remove({ "_id": { "$in": ids } });

Or the general implementation for whatever language with that basic structure.

So you are not exactly "piping" or "sub-querying" results as operations like that are not supported. But the $in operator is the way to efficiently combine here, as well as aggregation giving you an effective method of finding your "lowest" results.

It should generally be more effective than looping every possible "house_id" value with .find() and the .sort() and .limit(1) modifiers as you may have implemented or was otherwise suggested here.

Also opposed to as was otherwise suggested, you would not result in removing "all" of your documents, as could even be the case if you just did add .limit(1) to your find (as was not shown) being that you do not know if there was only one result. And you probably do not want to remove your only document.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top