Question

I have approx. 10 million Article objects in a Mongoid database. The huge number of Article objects makes the queries quite time consuming to perform.

As exemplified below, I am registering for each week (e.g. 700 days from now .. 7 days from now, 0 days from now) how many articles are in the database.

But for every query I make, the time consumption is increased, and Mongoid's CPU usage quickly reaches +100%.

articles = Article.where(published: true).asc(:datetime)
days = Date.today.mjd - articles.first.datetime.to_date.mjd

days.step(0, -7) do |n|
  current_date            = Date.today - n.days
  previous_articles       = articles.lt(datetime: current_date)
  previous_good_articles  = previous_articles.where(good: true).size
  previous_bad_articles   = previous_articles.where(good: false).size
end

Is there a way to save the Article objects to memory, so only need to call the database on the first line?

Was it helpful?

Solution

A MongoDB database is not build for that.

I think the best way is to run daily a script that creates your data for that day and save it in a Redis Database http://www.redis.io

Redis stores your data in the server memory, so you can access it every time of the day. And is very quick.

OTHER TIPS

Don't Repeat Yourself (DRY) is a best-practice that applies not only to code but also to processing. Many applications have natural epochs for summarizing data, a day is a good choice in your question, and if the data is historical, it only has to be summarized once. So you reduce processing of 10 million Article document down to 700 day-summary documents. You need special code for merging in today if you want up to the moment accurate data, but the previous savings is well worth the effort.

I politely disagree with the statement, "A MongoDB database is not build for that." You can see from the above that it is all about not repeating processing. The 700 day-summary documents can be stored in any reasonable data store. Since you already are using MongoDB, simply use another MongoDB collection for the day summaries. There's no need to spin up another data store if you don't want to. The summary data will easily fit in memory, and the reduction in processing means that your working set size will no longer be blown out by the historical processing.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top