Question

What's the most efficient way to iterate through an entire table using Datamapper?

If I do this, does Datamapper try to pull the entire result set into memory before performing the iteration? Assume, for the sake of argument, that I have millions of records and that this is infeasible:

Author.all.each do |a|
  puts a.title
end

Is there a way that I can tell Datamapper to load the results in chunks? Is it smart enough to know to do this automatically?

Was it helpful?

Solution

Datamapper will run just one sql query for the example above so it will have to keep the whole result set in memory.

I think you should use some sort of pagination if your collection is big. Using dm-pagination you could do something like:

PAGE_SIZE = 20
pager = Author.page(:per_page => PAGE_SIZE).pager # This will run a count query
(1..pager.total_pages).each do |page_number|
  Author.page(:per_page => PAGE_SIZE, :page => page_number).each do |a|
    puts a.title
  end
end

You can play around with different values for PAGE_SIZE to find a good trade-off between the number of sql queries and memory usage.

OTHER TIPS

Thanks, Nicolas, I actually came up with a similar solution. I've accepted your answer since it makes use of Datamapper's dm-pagination system, but I'm wondering if this would do equally as well (or worse):

while authors = Author.slice(offset, CHUNK) do
  authors.each do |a|
    # do something with a
  end
  offset += CHUNK
end

What you want is the dm-chunked_query plugin: (example from the docs)

require 'dm-chunked_query'

MyModel.each_chunk(20) do |chunk|
  chunk.each do |resource|
    # ...
  end
end

This will allow you to iterate over all the records in the model, in chunks of 20 records at a time.

EDIT: the example above had an extra #each after #each_chunk, and it was unnecessary. The gem author updated the README example, and I changed the above code to match.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top