MongoDB and multiple upsert

Question 1

I'll share my experience ...

At my last job we had a similar situation. We ended up doing one query/write per document/object. We used Mule ESB to pump data from the legacy system to Mongo and each write was an upsert.

The performance was pretty good, but not great. We could get several thousands documents into Mongo in a few minutes. The documents were fairly rich so that might have been part of why we had to throttle the writes to Mongo.

After we bulk loaded data, the "real time" performance was never an issue.

The first option you suggested sounds too complex and potentially leaves Mongo in an unknown state in case the operation dies halfway through an update. The upsert option saved us many times because we could replay inserts over and over and be safe.

Question 2

To expand on ryan1234's answer:

The 2.6 version of MongoDB will have the ability to send batched updates. For now you will need to submit separate requests for each document.

As ryan1234 said doing an upsert per document is the only safe way to update all of the existing documents and add the new documents if you do not know from the legacy provider. A single MongoDB process can easily handle thousands of updates per second(1) on mid-teir hardware. If you are not getting that level of performance then it is probably the latency of requests between the client and the MongDB server. The Asynchronous Java Driver can help overcome that limitation by allowing multiple update requests to be in-flight to the server at the same time with minimal client side complexity/threading.

HTH, Rob

1: I assume the documents are not growing and no index updates but even with those you should be able to approach a thousand updates a second.

Question 3

Or in case your keys are compound you could use:

public static BulkWriteResult insertAll(MongoCollection<Document> coll, List<Document> docs, String[] keyTags, boolean upsert) {
    if(docs.isEmpty())
        return null;
    List<UpdateOneModel<Document>> requests = new ArrayList<>(docs.size());
    UpdateOptions opt = new UpdateOptions().upsert(upsert);
    for (Document doc : docs ) {
        BasicDBObject filter = new BasicDBObject();
        for (String keyTag : keyTags) {
            filter.append(keyTag, doc.get(keyTag));
        }
        BasicDBObject action = new BasicDBObject("$set", doc);
        requests.add(new UpdateOneModel<Document>(filter, action, opt));
    }
    return coll.bulkWrite(requests);
}

Question 4

I know. It had to really dig deep for the right way to do it. Try this: /** * Insert all items in docs to collection. * @param coll the target collection * @param docs the new or updated documents * @param keyTag the name of the key in the document * @param upsert if true creates a new document if not found * @return BulkWriteResult or null if docs.isEmpty() */

    public static BulkWriteResult insertAll(MongoCollection<Document> coll, List<Document> docs, String keyTag, boolean upsert) {
    if(docs.isEmpty())
        return null;
    List<UpdateOneModel<Document>> requests = new ArrayList<>(docs.size());
    UpdateOptions opt = new UpdateOptions().upsert(upsert);
    for (Document doc : docs ) {
        BasicDBObject filter = new BasicDBObject(keyTag, doc.get(keyTag)); 
        BasicDBObject action = new BasicDBObject("$set", doc);
        requests.add(new UpdateOneModel<Document>(filter, action, opt));
    }
    return coll.bulkWrite(requests);
}