Build a select query that includes aggregated values from a different table
-
06-03-2021 - |
Question
I have two tables in Postgres: polls and votes.
The first one polls
is designed to contain the poll-related data. Every poll is aimed to have only two possible responses — option_a
and option_b
+----+---------+----------+----------+------------+
| id | author | option_a | option_b | created_at |
+----+---------+----------+----------+------------+
| 1 | user_01 | apple | banana | 2020/08/15 |
+----+---------+----------+----------+------------+
| 2 | user_02 | tea | coffee | 2020/08/16 |
+----+---------+----------+----------+------------+
The second one votes
is holding data about the votes:
+---------+---------+--------+------------+
| poll_id | voter | option | voted_at |
+---------+---------+--------+------------+
| 1 | user_01 | apple | 2020/08/15 |
+---------+---------+--------+------------+
| 1 | user_02 | banana | 2020/08/15 |
+---------+---------+--------+------------+
| 1 | user_03 | banana | 2020/08/15 |
+---------+---------+--------+------------+
| 1 | user_04 | apple | 2020/08/15 |
+---------+---------+--------+------------+
| 1 | user_05 | apple | 2020/08/15 |
+---------+---------+--------+------------+
| 2 | user_01 | tea | 2020/08/16 |
+---------+---------+--------+------------+
| 2 | user_08 | coffee | 2020/08/16 |
+---------+---------+--------+------------+
What I'm trying to do is to select polls with votes count.
E.g. for selecting the data for poll with id = 1
, I expect to get:
+---------+----------+----------+------------+
| poll_id | option_a | option_b | created_at |
+---------+----------+----------+------------+
| 1 | 3 | 2 | 2020/08/15 |
+---------+----------+----------+------------+
How to compose such a query?
Solution
You can use filtered aggregation for this:
select v.poll_id,
count(*) filter (where v.option = p.option_a) as option_a,
count(*) filter (where v.option = p.option_b) as option_b,
max(p.created_at) as created_at
from votes v
join polls p on p.id = v.poll_id
group by v.poll_id
order by v.poll_id;
The max(p.created_at)
is necessary to make the group by
happy.
OTHER TIPS
One query for all polls
Use the aggregate FILTER
clause, like a_horse already suggested. See:
SELECT p.id AS poll_id -- ① PK column!
, min(ct) FILTER (WHERE v.option = p.option_a) AS option_a
, min(ct) FILTER (WHERE v.option = p.option_b) AS option_b
, p.created_at -- ① no aggregate
FROM polls p
LEFT JOIN ( -- ②
SELECT poll_id, option, count(*)::int AS ct
FROM votes
GROUP BY 1, 2
) v ON v.poll_id = p.id
GROUP BY 1
ORDER BY 1;
① Assuming that polls.id
is the PRIMARY KEY
, polls.created_at
is covered this way and does not have to be aggregated. See:
② Aggregate first, join later. That's typically faster. See:
And LEFT JOIN
to keep polls with no votes in the result.
crosstab()
query for one poll
crosstab()
provided by the additional module tablefunc is typically faster. In this case, however, with only two options per poll and the need to access table polls
twice, it's probably slower:
SELECT v.*, p.created_at
FROM crosstab(
'SELECT poll_id, option, count(*)::int
FROM votes
WHERE poll_id = 1
GROUP BY 1, 2
ORDER BY 1, 2'
,'SELECT o.*
FROM polls, LATERAL (VALUES (option_a), (option_b)) o
WHERE id = 1'
) AS v (poll_id int, option_a int, option_b int)
JOIN polls p ON p.id = v.poll_id;
See:
db<>fiddle here