Question

This question is updated with a simpler explanation.

For my data the following Mongo CLI query takes 208ms. This query retrieves ALL the data for the 18 requested objects.

db.videos.find({avg_rating: {$gt: 1}, poster_large_thumb: {$exists:true}, release_date: {$lte: ISODate("2000-12-31")}}).sort({release_date:-1, avg_rating:-1, title:1}).skip(30).limit(18).pretty().explain()

{
    "cursor" : "BasicCursor",
    "nscanned" : 76112,
    "nscannedObjects" : 76112,
    "n" : 48,
    "scanAndOrder" : true,
    "millis" : 208,
    "nYields" : 0,
    "nChunkSkips" : 0,
    "isMultiKey" : false,
    "indexOnly" : false,
    "indexBounds" : {

    }
}

However with Mongoid when I do the query it creates a criteria object that has no actual data. I pass this criteria to a method that iterates it to create a JSON data structure. Since the data is needed each object must be retrieved from the DB. Here is the criteria returning query:

@videos = Video.order_by(release_date: -1, avg_rating: -1, title: 1).where(:avg_rating.gt => 1, :poster_large_thumb.exists => 1, :release_date.lte => start_date).skip(skip*POSTERS_PER_ROW).limit(limit*POSTERS_PER_ROW)

When I iterate @videos, each object takes over 240ms to retrieve from the DB, this from some debug output.

Getting one object: 
2013-12-18 00:43:52 UTC
2013-12-18 00:43:52 UTC
0.24489331245422363

Assuming that if I got all the data in the Video.order_by(...) query it would take 208ms total how can I force it to do retrieval in one query instead of getting each objects individually?

Something here is causing the entire retrieval to take a couple of orders of magnitude more than the Mongo CLI.

Was it helpful?

Solution 2

Thanks for the ideas, I think @Arthur gave the important hint. No one could have answered because it looks like the problem was in another bit of code--in how I access the criteria.

Given the following query, which produces a criteria:

@videos = Video.order_by(release_date: 1, avg_rating: 1, title: -1).where(:release_date.ne => 0,:avg_rating.gt => 1, :poster_large_thumb.exists => 1, :release_date.gt => start_date).skip(skip*POSTERS_PER_ROW).limit(limit*POSTERS_PER_ROW).only(:_id, :poster_large_thumb, :title)  

In a couple nested block I'm grabbing values with a line of code like this:

video_full = @videos[((row_index)*POSTERS_PER_ROW)+column_index]

This random access notation seems to be the problem. It seems to execute the full Moped query for every individual object so POSTERS_PER_ROW*num_rows times.

If I grab all the videos before the loops with this bit of code:

@videos.each do |video|
    videos_full.push video
end

Then grab the values from an array instead of the criteria like this:

video_full = videos_full[((row_index)*POSTERS_PER_ROW)+column_index]

I get only one Moped query of 248ms, all objects are retrieved with that one query. This is a huge speed up. the query time goes from num_rows* POSTERS_PER_ROW*248ms to just 248ms

This is a big lesson learned for me so if someone can point to the docs that describe this effect and the rules to follow I'd sure appreciate it.

OTHER TIPS

Responses:

  1. skip().limit() queries tend to get slower and slower on MongoDB side. As skip walks through the docs, more info see here https://stackoverflow.com/a/7228190/534150

  2. The multiple identical queries look like to me a N+1 type of issue. That means that probably, somewhere in your view, you have a loop that calls a property that is lazy loaded, so it sends the query over and over again. Those problems are usually tricky to find, but to track them you need to have the end-to-end trace, which you are probably the only one that can do that, as you have access to the source code.

  3. the Mongoid side looks correct to me.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top