Question

Possible Duplicate:
Optimization of count query for PostgreSQL

Using PostgreSQL 9.2, we are trying to figure out if there is a way to keep track of the number of results for a query, and return that number in an efficient manner. This query should be executed several (possibly tens to hundreds or even thousands) of times per second. The query we have right now looks like this, but we wonder if this is inefficient:

-- Get # of rows that do not have ‘parameter value’ in array_column
select count(*) 
    from table
    where not (ARRAY[‘parameter value’] <@ table.array_column)

My questions are (an answer might solve multiple problems at the same time):

Is the count(id) (or count(*) for that matter) for that query a linear (O(n)) query?

Is there some way to make this query more efficient in PostgreSQL? Please keep in mind that we need to query for different parameter values, so I believe keeping a materialized view for it is not feasible (although, we might consider creating one for each parameter value if that is considered to be better).

Is there any change I should make to the query, database structure or the configuration of my PostgreSQL server that might help me improve the query performance?

Any pointers or suggestions will be greatly appreciated. If this is a completely wrong way to do this, please let me know. Thanks very much!

Edit

Taking into consideration what was answered, I was wondering if it would be plausible to use materialized views. By this I mean having several materialized views (each one for a different parameter value, having the rows where that value is not present). We the parameter values are, to a certain extent, predictable, so this doesn't seem too far out there as a solution. This brings another matter into question: Would materialized views help here? Is there some limitation (either in definition or performance) as to the number of materialized views (or tables) that I can create in a database?

Thanks very much for your help!

Was it helpful?

Solution

The first idea that come to mind is to cache the value.

You should evaluate the rate of change of this value, and depending on that decide if you want to have a trigger to be executed when this table is updated to compute the new value and cache it somewhere.

The resulting query for that value would be a simple SELECT without any WHERE clause, making it very fast.

Or you could simply do the change, and get some stats before and after to know if you've gained in speed.

See there for further explanations.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top