Domanda

I'm inserting a lot of test records in a mongodb instance, via a mongo shell script.

I use batch inserts for performance db.messages.save(messagesBatch);

However, mongo upsert or update my data instead of inserting it !

After cleaning the collection, I run a loop for 200 inserts, via batches of 50. I end up with 51 (??) records after 4 batches, with following reports from db.getLastErrorObj():

/* 0 */
{
"n" : 0,
"connectionId" : 166,
"err" : null,
"ok" : 1
}

/* 1 */
{
"updatedExisting" : false,
"upserted" : ObjectId("527141c72a1ae75210d3a705"),
"n" : 1,
"connectionId" : 166,
"err" : null,
"ok" : 1
}

/* 2 */
{
"updatedExisting" : true,
"n" : 1,
"connectionId" : 166,
"err" : null,
"ok" : 1
}

/* 3 */
{
"updatedExisting" : true,
"n" : 1,
"connectionId" : 166,
"err" : null,
"ok" : 1
}

my insertion code is the following :

var batchLimit = 50;
var messagesBatch = [];

function flushMessages() {
print("* flushing... (" + messagesBatch.length + ")");
var inserted = false; // so far
do {
    db.messages.save(messagesBatch);
    var errObj = db.getLastErrorObj();
    print(errObj);
    if(errObj.ok && errObj.err === null) {
        // no error, fine
        inserted = true;
        messagesBatch.length = 0;
        print("* flushed. (" + messagesBatch.length + ")");
    }
    else {
        // insertion error !
        failedInsertions++;
        print(errObj);
    }
} while(!inserted);
}

function addMessage(message) {
messagesBatch.push(message);
if(messagesBatch.length >= batchLimit) {
    flushMessages();
}
msgGenerated++;
if(msgGenerated % 100000 == 0)
    print("* " + msgGenerated);
}

Can someone see why this code is upserting instead of inserting ? What am I doing wrong ?

Note : of course, the documents I'm inserting don't have an _id field.

È stato utile?

Soluzione

It seems it comes from using the messagesBatch.length = 0; technique for emptying the array to prepare the next batch. When instead "reseting" (sort of) by creating a new array with messagesBatch = []; it works as expected.

I guess the insertion is asynchronous and works directly on the array ref, and it seems that waiting for getLastErrorObj() is not enough to be sure that all data has been written. This seems wrong.

The empty 51th record came from a bad systematic "safety" flushing of an empty array at the end of the script and was not related to the problem.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top