Question

I have a rake task that processes a set of records and saves it in another collection:

batch = [] 

Record.where(:type => 'a').each do |r| 
  batch <<  make_score(r)

  if batch.size %100 == 0 
    Score.collection.insert(batch) 
    batch = [] 
  end 
end 

I'm processing about 100K records at a time. Unfortunately at 20 minutes, I get a Query response returned CURSOR_NOT_FOUND error.

The mongodb faq says to use skip and limit or turn off timeouts, using them the all thing was about ~2-3 times slower.

How can I turn off timeouts in conjunction with mongoid?

Was it helpful?

Solution

The MongoDB docs say you can pass in a timeout boolean, and it timeout is false, it will never timeout

collection.find({"type" => "a"}, {:timeout=>false})

In your case:

Record.collection.find({:type=>'a'}, :timeout => false).each ...

I also recommend you look into map-reduced with Mongo. It seems tailer made to this sort of collection array manipulation: http://www.mongodb.org/display/DOCS/MapReduce

OTHER TIPS

In mongoid 3 you can use this:

ModelName.all.no_timeout.each do |m|
   "do something with model"
end

Which is pretty handy.

It does seem, for now at least, you have to go the long route and query via the Mongo driver:

Mongoid.database[collection.name].find({ a_query }, { :timeout => false }) do |cursor| 
  cursor.each do |row| 
    do_stuff 
  end 
end

Here is the workaround I did. Create an array to hold the full records, and work from that array like this

products = []

Product.all.each do |p|
products << p
end

products.each do |p|
# Do your magic
end

dumping all records into the array will most likely finish within before the timeout, unless you are working on extremely large number of records. Also, this is going to consume too much memory in case you are dealing with large or too many records as well, so keep in that mind.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top