Can this array group count query be improved?
-
31-10-2019 - |
Question
So I have the following query
explain analyze
with tags as (
select unnest(tags) as tag_name from tasks where user_id = 1
) select
count(9),
tag_name
from
tags
group by
tag_name
order by
count(9) desc
limit 50
Gives me the following result:
Limit (cost=3243.86..3243.99 rows=50 width=32) (actual time=2.278..2.278 rows=1 loops=1)
CTE tags
-> Bitmap Heap Scan on tasks (cost=12.35..1917.72 rows=52700 width=13) (actual time=0.098..2.074 rows=261 loops=1)
Recheck Cond: (user_id = 1)
-> Bitmap Index Scan on index_tasks_user_id (cost=0.00..12.22 rows=527 width=0) (actual time=0.065..0.065 rows=261 loops=1)
Index Cond: (user_id = 1)
-> Sort (cost=1326.14..1326.64 rows=200 width=32) (actual time=2.278..2.278 rows=1 loops=1)
Sort Key: (count(9))
Sort Method: quicksort Memory: 25kB
-> HashAggregate (cost=1317.50..1319.50 rows=200 width=32) (actual time=2.273..2.274 rows=1 loops=1)
-> CTE Scan on tags (cost=0.00..1054.00 rows=52700 width=32) (actual time=0.099..2.177 rows=261 loops=1)
Total runtime: 2.314 ms
Which is pretty decent I suppose. The previous way of doing things where to have a bunch of join tables and that gave me something like below:
Limit (cost=919.38..919.40 rows=50 width=12) (actual time=163.164..163.257 rows=50 loops=1)
-> Sort (cost=919.38..919.48 rows=206 width=12) (actual time=163.162..163.194 rows=50 loops=1)
Sort Key: (count(*))
Sort Method: top-N heapsort Memory: 28kB
-> HashAggregate (cost=917.39..918.01 rows=206 width=12) (actual time=162.899..163.008 rows=132 loops=1)
-> Nested Loop (cost=456.90..917.19 rows=206 width=12) (actual time=1.040..162.361 rows=416 loops=1)
-> Hash Join (cost=456.90..904.32 rows=206 width=4) (actual time=1.029..159.429 rows=416 loops=1)
Hash Cond: (taggings.workout_id = workouts.id)
-> Seq Scan on taggings (cost=0.00..416.64 rows=40214 width=8) (actual time=0.010..45.753 rows=37029 loops=1)
-> Hash (cost=455.91..455.91 rows=282 width=4) (actual time=1.004..1.004 rows=293 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 11kB
-> Bitmap Heap Scan on workouts (cost=4.49..455.91 rows=282 width=4) (actual time=0.101..0.744 rows=293 loops=1)
Recheck Cond: (user_id = 1)
-> Bitmap Index Scan on index_workouts_on_user_id (cost=0.00..4.48 rows=282 width=0) (actual time=0.058..0.058 rows=293 loops=1)
Index Cond: (user_id = 1)
-> Index Scan using tags_pkey on tags (cost=0.00..0.06 rows=1 width=16) (actual time=0.003..0.004 rows=1 loops=416)
Index Cond: (id = taggings.tag_id)
Total runtime: 163.393 ms
Now forget about the last explain and lets focus on the first one. Can it be optimized further? Any tricks or such that I might be missing out on? I guess an index on the user_id column should be plenty for this query?
No correct solution
Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange