Can this array group count query be improved?

https://dba.stackexchange.com/questions/34773

31-10-2019
|

Question

So I have the following query

explain analyze
with tags as (
    select unnest(tags) as tag_name from tasks where user_id = 1
) select 
        count(9), 
        tag_name
    from 
        tags
    group by
        tag_name
    order by 
        count(9) desc
    limit 50

Gives me the following result:

Limit  (cost=3243.86..3243.99 rows=50 width=32) (actual time=2.278..2.278 rows=1 loops=1)
  CTE tags
    ->  Bitmap Heap Scan on tasks  (cost=12.35..1917.72 rows=52700 width=13) (actual time=0.098..2.074 rows=261 loops=1)
          Recheck Cond: (user_id = 1)
          ->  Bitmap Index Scan on index_tasks_user_id  (cost=0.00..12.22 rows=527 width=0) (actual time=0.065..0.065 rows=261 loops=1)
                Index Cond: (user_id = 1)
  ->  Sort  (cost=1326.14..1326.64 rows=200 width=32) (actual time=2.278..2.278 rows=1 loops=1)
        Sort Key: (count(9))
        Sort Method: quicksort  Memory: 25kB
        ->  HashAggregate  (cost=1317.50..1319.50 rows=200 width=32) (actual time=2.273..2.274 rows=1 loops=1)
              ->  CTE Scan on tags  (cost=0.00..1054.00 rows=52700 width=32) (actual time=0.099..2.177 rows=261 loops=1)
Total runtime: 2.314 ms

Which is pretty decent I suppose. The previous way of doing things where to have a bunch of join tables and that gave me something like below:

Limit  (cost=919.38..919.40 rows=50 width=12) (actual time=163.164..163.257 rows=50 loops=1)
  ->  Sort  (cost=919.38..919.48 rows=206 width=12) (actual time=163.162..163.194 rows=50 loops=1)
        Sort Key: (count(*))
        Sort Method: top-N heapsort  Memory: 28kB
        ->  HashAggregate  (cost=917.39..918.01 rows=206 width=12) (actual time=162.899..163.008 rows=132 loops=1)
              ->  Nested Loop  (cost=456.90..917.19 rows=206 width=12) (actual time=1.040..162.361 rows=416 loops=1)
                    ->  Hash Join  (cost=456.90..904.32 rows=206 width=4) (actual time=1.029..159.429 rows=416 loops=1)
                          Hash Cond: (taggings.workout_id = workouts.id)
                          ->  Seq Scan on taggings  (cost=0.00..416.64 rows=40214 width=8) (actual time=0.010..45.753 rows=37029 loops=1)
                          ->  Hash  (cost=455.91..455.91 rows=282 width=4) (actual time=1.004..1.004 rows=293 loops=1)
                                Buckets: 1024  Batches: 1  Memory Usage: 11kB
                                ->  Bitmap Heap Scan on workouts  (cost=4.49..455.91 rows=282 width=4) (actual time=0.101..0.744 rows=293 loops=1)
                                      Recheck Cond: (user_id = 1)
                                      ->  Bitmap Index Scan on index_workouts_on_user_id  (cost=0.00..4.48 rows=282 width=0) (actual time=0.058..0.058 rows=293 loops=1)
                                            Index Cond: (user_id = 1)
                    ->  Index Scan using tags_pkey on tags  (cost=0.00..0.06 rows=1 width=16) (actual time=0.003..0.004 rows=1 loops=416)
                          Index Cond: (id = taggings.tag_id)
Total runtime: 163.393 ms

Now forget about the last explain and lets focus on the first one. Can it be optimized further? Any tricks or such that I might be missing out on? I guess an index on the user_id column should be plenty for this query?

No correct solution

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange