Question

I have two tables in Postgres: polls and votes.

The first one polls is designed to contain the poll-related data. Every poll is aimed to have only two possible responses — option_a and option_b

+----+---------+----------+----------+------------+
| id | author  | option_a | option_b | created_at |
+----+---------+----------+----------+------------+
| 1  | user_01 | apple    | banana   | 2020/08/15 |
+----+---------+----------+----------+------------+
| 2  | user_02 | tea      | coffee   | 2020/08/16 |
+----+---------+----------+----------+------------+

The second one votes is holding data about the votes:

+---------+---------+--------+------------+
| poll_id | voter   | option | voted_at   |
+---------+---------+--------+------------+
| 1       | user_01 | apple  | 2020/08/15 |
+---------+---------+--------+------------+
| 1       | user_02 | banana | 2020/08/15 |
+---------+---------+--------+------------+
| 1       | user_03 | banana | 2020/08/15 |
+---------+---------+--------+------------+
| 1       | user_04 | apple  | 2020/08/15 |
+---------+---------+--------+------------+
| 1       | user_05 | apple  | 2020/08/15 |
+---------+---------+--------+------------+
| 2       | user_01 | tea    | 2020/08/16 |
+---------+---------+--------+------------+
| 2       | user_08 | coffee | 2020/08/16 |
+---------+---------+--------+------------+

What I'm trying to do is to select polls with votes count. E.g. for selecting the data for poll with id = 1, I expect to get:

+---------+----------+----------+------------+
| poll_id | option_a | option_b | created_at |
+---------+----------+----------+------------+
| 1       | 3        | 2        | 2020/08/15 |
+---------+----------+----------+------------+

How to compose such a query?

Was it helpful?

Solution

You can use filtered aggregation for this:

select v.poll_id, 
       count(*) filter (where v.option = p.option_a) as option_a, 
       count(*) filter (where v.option = p.option_b) as option_b,
       max(p.created_at) as created_at
from votes v 
  join polls p on p.id = v.poll_id
group by v.poll_id
order by v.poll_id;

The max(p.created_at) is necessary to make the group by happy.

Online example

OTHER TIPS

One query for all polls

Use the aggregate FILTER clause, like a_horse already suggested. See:

SELECT p.id AS poll_id  -- ① PK column!
     , min(ct) FILTER (WHERE v.option = p.option_a) AS option_a 
     , min(ct) FILTER (WHERE v.option = p.option_b) AS option_b
     , p.created_at     -- ① no aggregate
FROM  polls p
LEFT  JOIN (  -- ②
   SELECT poll_id, option, count(*)::int AS ct
   FROM   votes
   GROUP  BY 1, 2
   ) v ON v.poll_id = p.id
GROUP  BY 1
ORDER  BY 1;

① Assuming that polls.id is the PRIMARY KEY, polls.created_at is covered this way and does not have to be aggregated. See:

② Aggregate first, join later. That's typically faster. See:

And LEFT JOIN to keep polls with no votes in the result.

crosstab() query for one poll

crosstab() provided by the additional module tablefunc is typically faster. In this case, however, with only two options per poll and the need to access table polls twice, it's probably slower:

SELECT v.*, p.created_at
FROM   crosstab(
   'SELECT poll_id, option, count(*)::int
    FROM   votes
    WHERE  poll_id = 1
    GROUP  BY 1, 2
    ORDER  BY 1, 2'

  ,'SELECT o.*
    FROM   polls, LATERAL (VALUES (option_a), (option_b)) o
    WHERE  id = 1'
   ) AS v (poll_id int, option_a int, option_b int)
JOIN polls p ON p.id = v.poll_id;

See:

db<>fiddle here

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top