Working on a dashboard page which does a lot of analytics to display BOTH graphical and tabular data to users.
When the dashboard is filtered by a given year, I have to display analytics for the selected year, another year chosen for comparison, and historical averages from all time.
For the selected and comparison years, I create start/end DateTime objects that are set to the beginning_of_year and end_of_year.
year = Model.where("closed_at >= ?", start).where("closed_at <= ?", end).all
comp = Model.where("closed_at >= ?", comp_start).where("closed_at <= ?", comp_end).all
These queries are essentially the same, just different date filters. I don't really see any way to optimize this besides trying to only "select(...)" the fields I need, which will probably be all of them.
Since there will be an average of 250-1000 records in a given year, they aren't "horrible" (in my not-very-skilled opinion).
However, the historical averages are causing me a lot of pain. In order to adequately show the averages, I have to query ALL the records for all time and perform calculations on them. This is a bad idea, but I don't know how to get around it.
all_for_average = Model.all
Surely people have run into these kinds of problems before and have some means of optimizing them? Returning somewhere in the ballpark of 2,000 - 50,000 records for historical average analysis can't be very efficient. However, I don't see another way to perform the analysis unless I first retrieve the records.
Option 1: Grab everything and filter using Ruby
Since I'm already grabbing everything via Model.all
, I "could" remove the 2 year queries by simply grabbing the desired records from the historical average instead. But this seems wrong...I'm literally "downloading" my DB (so to speak) and then querying it with Ruby code instead of SQL. Seems very inefficient. Has anyone tried this before and seen any performance gains?
Option 2: Using multiple SQL DB calls to get select information
This would mean instead of grabbing all records for a given time period, I would make several DB queries to get the "answers" from the DB instead of analyzing the data in Ruby.
Instead of running something like this,
year = Model.where("closed_at >= ?", start).where("closed_at <= ?", end).all
I would perform multiple queries:
year_total_count = Model.where(DATE RANGE).size
year_amount_sum = Model.where(DATE RANGE).sum("amount")
year_count_per_month = Model.where(DATE RANGE).group("MONTH(closed_at)")
...other queries to extract selected info...
Again, this seems very inefficient, but I'm not knowledgeable enough about SQL and Ruby code efficiencies to know which would lead to obvious downsides.
I "can" code both routes and then compare them with each other, but it will take a few days to code/run them since there's a lot of information on the dashboard page I'm leaving out. Certainly these situations have been run into multiple times for dashboard/analytics pages; is there a general principle for these types of situations?
I'm using PostgreSQL on Rails 4. I've been looking into DB-specific solutions as well, as being "database agnostic" really is irrelevant for most applications.