Question

I'm trying to use the explain method in both Rails 3 and 4 to estimate the number of rows returned in what can be a particularly expensive query. It joins 3 tables and can result in table scans of 10 million row tables, which combined with the count() aggregate is particularly slow (database is Postgres 9.3).

My problem is this. if I use the inbuilt explain() method, the query is always run IN FULL behind the scenes before returning a result. This can take over 2 minutes. There may be other scenarios where the query I want to analyse could take hours to run (eg for reports).

I have a slightly ugly solution where I do a to_sql, tack "explain" on the front, and execute the query. This works in Rails 3 but required some rework for Rails 4.

So I suppose my question is this. Is there a way to get the inbuilt AR explain() method to do what I want, is there some other elegant way to do this, or is this a bug in AR::explain() which needs to be logged and fixed at some point?

Était-ce utile?

La solution

Here is how I did this. In both Rails 3 and 4 I wrote an initializer for ActiveRecord::Relation.

First, in Rails 3:

class ActiveRecord::Relation
  HUGE_COUNT = 20000

  def count(column_name = nil, options = {})
    exact, has_conditions = false, false
    h = (column_name.class == Hash ? column_name : options)
    exact = h[:exact]
    has_conditions = h[:conditions]
    has_distinct = (column_name.class == String) && (column_name =~ /\bdistinct\b/i)
    h = h.except(:exact) # Remove it because super won't understand it
    column_name.class == Hash ? column_name = h : options = h
    if exact || has_conditions || has_distinct
      super
    else
      est = estimated_count
      est > HUGE_COUNT ? est : super
    end
  end

  def estimated_count
    node = connection.execute("EXPLAIN #{self.to_sql}").first
    match = node['QUERY PLAN'].match(/rows=\d+\b/)
    match ? match[0].split('=').last.to_i : 0
  end

end

Rails 4 is the same except for:

  def estimated_count
    node = {}
    connection.unprepared_statement do
      node = connection.execute("EXPLAIN #{self.to_sql}").first
    end
    match = node['QUERY PLAN'].match(/rows=\d+\b/)
    match ? match[0].split('=').last.to_i : 0
  end

HUGE_COUNT is low because so far I've found that this is generally very accurate to within 1 or 2%. That is fine for my needs but obviously this is fairly dangerous ...

Autres conseils

I'm not sure if there's a method that will do this asynchronously. However, you can definitely benefit by using resque or sidekiq to run your queries asynchronously.

Here's the link to resque:

https://github.com/resque/resque

Here's the link to sidekiq:

https://github.com/mperham/sidekiq

The reason it's running the query in full is that ActiveRecord .explain is designed to run the query. It's not the same as a SQL EXPLAIN. It's more like a SQL EXPLAIN ANALYZE.

As the documentation suggests,

explain actually executes the query, and then asks for the query plans.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top