Build a select query that includes aggregated values from a different table

https://dba.stackexchange.com/questions/273861

06-03-2021
|

Question

I have two tables in Postgres: polls and votes.

The first one polls is designed to contain the poll-related data. Every poll is aimed to have only two possible responses — option_a and option_b

+----+---------+----------+----------+------------+
| id | author  | option_a | option_b | created_at |
+----+---------+----------+----------+------------+
| 1  | user_01 | apple    | banana   | 2020/08/15 |
+----+---------+----------+----------+------------+
| 2  | user_02 | tea      | coffee   | 2020/08/16 |
+----+---------+----------+----------+------------+

The second one votes is holding data about the votes:

+---------+---------+--------+------------+
| poll_id | voter   | option | voted_at   |
+---------+---------+--------+------------+
| 1       | user_01 | apple  | 2020/08/15 |
+---------+---------+--------+------------+
| 1       | user_02 | banana | 2020/08/15 |
+---------+---------+--------+------------+
| 1       | user_03 | banana | 2020/08/15 |
+---------+---------+--------+------------+
| 1       | user_04 | apple  | 2020/08/15 |
+---------+---------+--------+------------+
| 1       | user_05 | apple  | 2020/08/15 |
+---------+---------+--------+------------+
| 2       | user_01 | tea    | 2020/08/16 |
+---------+---------+--------+------------+
| 2       | user_08 | coffee | 2020/08/16 |
+---------+---------+--------+------------+

What I'm trying to do is to select polls with votes count. E.g. for selecting the data for poll with id = 1, I expect to get:

+---------+----------+----------+------------+
| poll_id | option_a | option_b | created_at |
+---------+----------+----------+------------+
| 1       | 3        | 2        | 2020/08/15 |
+---------+----------+----------+------------+

How to compose such a query?

Solution

You can use filtered aggregation for this:

select v.poll_id, 
       count(*) filter (where v.option = p.option_a) as option_a, 
       count(*) filter (where v.option = p.option_b) as option_b,
       max(p.created_at) as created_at
from votes v 
  join polls p on p.id = v.poll_id
group by v.poll_id
order by v.poll_id;

The max(p.created_at) is necessary to make the group by happy.

Online example

OTHER TIPS

One query for all polls

Use the aggregate FILTER clause, like a_horse already suggested. See:

Return counts for multiple ranges in a single SELECT statement

SELECT p.id AS poll_id  -- ① PK column!
     , min(ct) FILTER (WHERE v.option = p.option_a) AS option_a 
     , min(ct) FILTER (WHERE v.option = p.option_b) AS option_b
     , p.created_at     -- ① no aggregate
FROM  polls p
LEFT  JOIN (  -- ②
   SELECT poll_id, option, count(*)::int AS ct
   FROM   votes
   GROUP  BY 1, 2
   ) v ON v.poll_id = p.id
GROUP  BY 1
ORDER  BY 1;

① Assuming that polls.id is the PRIMARY KEY, polls.created_at is covered this way and does not have to be aggregated. See:

Select first row (grouping) + add aggregate function

② Aggregate first, join later. That's typically faster. See:

Slow queries related to subqueries using aggregation

And LEFT JOIN to keep polls with no votes in the result.

`crosstab()` query for one poll

crosstab() provided by the additional module tablefunc is typically faster. In this case, however, with only two options per poll and the need to access table polls twice, it's probably slower:

SELECT v.*, p.created_at
FROM   crosstab(
   'SELECT poll_id, option, count(*)::int
    FROM   votes
    WHERE  poll_id = 1
    GROUP  BY 1, 2
    ORDER  BY 1, 2'

  ,'SELECT o.*
    FROM   polls, LATERAL (VALUES (option_a), (option_b)) o
    WHERE  id = 1'
   ) AS v (poll_id int, option_a int, option_b int)
JOIN polls p ON p.id = v.poll_id;

See:

PostgreSQL Crosstab Query

db<>fiddle here

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange

Build a select query that includes aggregated values from a different table

One query for all polls

crosstab() query for one poll

`crosstab()` query for one poll