Question

It is first time I came across problem of long time of query execution. Problem is actually pretty big because query is executing in more then 20seconds which highly visible for endpoint user.

I have quite large database of topics (~8k), topic's have it's parameters (which is dictionared - I have 113 different parameters for 8k topics).

I would like to show report about number of repetitions of those topics.

topic table:
----------------+---------+-----------------------------------------------------
 id             | integer | nextval('topic_id_seq'::regclass)
 topicengine_id | integer |
 description    | text    |
 topicparam_id  | integer |
 date           | date    |

topicparam table:
----------------+---------+----------------------------------------------------------
 id             | integer | nextval('topicparam_id_seq'::regclass)
 name           | text    |

and my query:

select distinct tp.id as tpid, tp.name as desc, (select count(*) from topic where topic.topicparam_id = tp.id) as count, t.date
from topicparam tp, topic t where t.topicparam_id =tp.id

Total runtime: 22372.699 ms

fragment of result :

 tpid |                     topicname               | count |    date
------+---------------------------------------------+-------+---------
 3823 | Topic1                                      |     6 | 2014-03-01
 3756 | Topic2                                      |    14 | 2014-03-01
 3803 | Topic3                                      |    28 | 2014-04-01
 3780 | Topic4                                      |  1373 | 2014-02-01

Is there any way to optimize time of execution for this query?

Was it helpful?

Solution

A simply group by should do the same thing (if I understood your query correctly.

select tp.id as tpid, 
       max(tp.name) as desc, 
       count(*) as count, 
       max(t.date) as date
from topicparam tp
  join topic t on t.topicparam_id = tp.id
group by tp.id;

Btw: date is a horrible name for a column. For one reason because it's also a reserved word, but more importantly because it does not document what the column contains. A "start date", an "end date", a "due date", a "recording date", a "publish date", ...?

OTHER TIPS

You can try this query:

SELECT tp.id AS tpid,
       tp.name AS DESC,
       topic.cnt AS count,
       t.date
FROM topicparam tp
JOIN topic t
  ON t.topicparam_id =tp.id
JOIN (SELECT topicparam_id,
             count(*) cnt 
      FROM topic
      GROUP BY topicparam_id) topic
  ON topic.topicparam_id = tp.id
GROUP BY tp.id,
         tp.name,
         t.date,
         topic.cnt

For me DISTINCT + SUBQUERY are killing your performance. You should use GROUP BY in both way to "disinct" you data and "count".

SELECT 
    tp.id as tpid
    , tp.name as description
    , count(*) as numberOfTopics
    , t.date
FROM 
    topicparam tp
    INNER JOIN topic t 
        ON t.topicparam_id = tp.id
GROUP BY
    tp.id 
    , tp.name
    , t.date

Considering the bulk of data, you have to pay attention on indexes :

In this case, use indexes on topicparam.id and topic.id

Remove indexes on columns that is never use in join clauses.

Try to not use sql reserved words like "date, desc, count" for aliases or table fields.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top