質問

I am trying to implement a massive db.update() to my Mongodb via a stream using the Node native mongodb module.

What I have looks like this:

stream.on('data', function(data){
   db.collection('my-db').update({_id:data.id},{$set:{notes:data.notes}},{upsert:true},
   function(err,res){
      if(err) throw err;
      console.log(res);
   })
})

My stream reads a csv file that is over 1.5million rows and it is crashing before completion.

I understand that Async.whilst can do this but I am running into problems using it. I have tried:

stream.on('data', function(data)
var counter = 0;
async.whilst(
   function(){counter < 10;},
   function(cb){
      var docs = [];
      counter++;
      docs.push({id:data.id,notes:data.notes});   
   }
   db.collection('my-db').update(docs[0],{$set:docs[0]},{upsert:true},function(err,res){
     if(err)throw err;
     console.log(res);
     counter--;
   })
   function(err){
     //do something
   }
)
})

However this seems does not seem to throttle the connections to my database and save my node program from crashing.

Any help would be greatly appreciated.

役に立ちましたか?

解決

Okay I overcame this issue by using the async module specifically async.queue.

I was able to create this variable:

var q  = async.queue(function(task, cb){
    cb();
}, 50)
q.drain = function() {
    console.log('all items have been processed');
}

From my stream:

q.push({name:dbUpdate}, function(err){
  db.collection('my-db').update({id:data.id},{$set:{notes:data.notes}},{upsert:true},function(err,res){  
     //do something
  })
})

I am positive now that my problem was more on node's mongodb-driver side.
Thanks everyone!

他のヒント

In order to divide and conquer this problem, how about you exclude either reading from a file and just loop over 1.5 million "counts", or do read from a file, but do not call mongodb. What I am trying to get at is to figure out if this is a MongoDB driver error, or the async error.

Alex Lerner

ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top