Question

I have a complicated (to me) SQL query in postgresql 9.2.4 using generate_series and multiple joins. I need to sum the reps for all exercises on a particular day from the exercise table, and make sure that those exercises belong to workouts done by the current user. Finally I need to join that table to a series to display missing dates (using generate_series).

My thoughts were to select the series in the from clause and then left join the series to a subquery that had the results of an inner join between the exercises and workouts table. For example, I have the following query:

SELECT 
    DISTINCT date_trunc('day', series.date)::date as date,
    sum(COALESCE(reps, 0)) OVER WIN,
    array_agg(workout_id) OVER WIN as ids     
FROM (
    select generate_series(-22, 0) + current_date as date
) series 
LEFT JOIN (
    exercises INNER JOIN workouts 
    ON exercises.workout_id = workouts.id
) 
ON series.date = exercises.created_at::date 
WINDOW 
   WIN AS (PARTITION BY date_trunc('day', series.date)::date)
ORDER BY date ASC;

This gives the following output:

    date    | sum |                           ids                           
------------+-----+---------------------------------------------------------
 2013-04-27 |   0 | {NULL}
 2013-04-28 | 432 | {49,48,47,46,45,44,43,42,41,38,37,36,36,36,36,35,34,33}
 2013-04-29 |   0 | {NULL}
 2013-04-30 |  20 | {50}
 2013-05-01 |   0 | {NULL}
 2013-05-02 |   0 | {NULL}
 2013-05-03 |   0 | {NULL}
 2013-05-04 |   0 | {NULL}
 2013-05-05 |   0 | {NULL}
 2013-05-06 |   0 | {NULL}
 2013-05-07 |  40 | {51,51}
 2013-05-08 |   0 | {NULL}
 2013-05-09 |   0 | {NULL}
 2013-05-10 |   0 | {NULL}
 2013-05-11 |   0 | {NULL}
 2013-05-12 |   0 | {NULL}
 2013-05-13 |   0 | {NULL}
 2013-05-14 |   0 | {NULL}
 2013-05-15 |   0 | {NULL}
 2013-05-16 |  20 | {52}
 2013-05-17 |   0 | {NULL}
 2013-05-18 |   0 | {NULL}
 2013-05-19 |   0 | {NULL}
(23 rows)

However, I want to filter by certain conditions:

WHERE workouts.user_id = 5

for example.

But if I put a WHERE clause into the query above with that condition, the output is like this:

    date    | sum |                           ids                           
------------+-----+---------------------------------------------------------
 2013-04-28 | 432 | {49,48,47,46,45,44,43,42,41,38,37,36,36,36,36,35,34,33}
 2013-04-30 |  20 | {50}
 2013-05-07 |  40 | {51,51}
 2013-05-16 |  20 | {52}
(4 rows)

The series goes away.

How can I filter by user_id and keep the series? Any help would be much appreciated.

Was it helpful?

Solution 2

instead of taking all the data from WORKOUTS table you can put this condition over there also as -

SELECT 
    DISTINCT date_trunc('day', series.date)::date as date,
    sum(COALESCE(reps, 0)) OVER WIN,
    array_agg(workout_id) OVER WIN as ids     
FROM (
    select generate_series(-22, 0) + current_date as date
) series 
LEFT JOIN (
    exercises INNER JOIN (select * from workouts where user_id = 5) workout 
    ON exercises.workout_id = workouts.id
) 
ON series.date = exercises.created_at::date 
WINDOW 
   WIN AS (PARTITION BY date_trunc('day', series.date)::date)
ORDER BY date ASC;

I think this should give you the output what you are looking for.

OTHER TIPS

I have a complicated (to me) SQL query ...

Indeed, you do. But it doesn't have to be that way:

SELECT s.day
      ,COALESCE(sum(w.reps), 0) AS sum_reps  -- assuming reps comes from workouts
      ,array_agg(e.workout_id)  AS ids
FROM   exercises e
JOIN   workouts  w ON w.id = e.workout_id AND w.user_id = 5
RIGHT  JOIN (
   SELECT now()::date + generate_series(-22, 0) AS day
   ) s ON s.day = e.created_at::date 
GROUP  BY 1
ORDER  BY 1;

Major points:

  • RIGHT [OUTER] JOIN is the reverse twin of LEFT JOIN. Since joins are applied left-to-right, you don't need parentheses this way.

  • Never use the basic type and function name date as identifier. I substituted with day.

  • Update: To avoid NULL in the result for the aggregate / window function sum() use an outer COALESCE like demonstrated below: COALESCE(sum(reps), 0))

    sum(COALESCE(reps, 0))
  • You don't need to date_trunc() at all. It's a date to begin with:

    date_trunc('day', s.day)::date AS day
  • Instead of the complicated and comparatively expensive combination od DISTINCT + window functions, you can just use a simple GROUP BY in this case.

Aggregate functions and COALESCE()

There has been confusion with this in a number of questions recently.

Generally, sum() or other aggregate functions ignore NULL values. The result is the same as if the value wasn't there at all. However, there are a number of special cases. The manual advises:

It should be noted that except for count, these functions return a null value when no rows are selected. In particular, sum of no rows returns null, not zero as one might expect, and array_agg returns null rather than an empty array when there are no input rows. The coalesce function can be used to substitute zero or an empty array for null when necessary.

This demo should serve to clarify by demonstrating the corner cases:

  • 1 table with no rows.
  • 3 tables with 1 row holding (NULL / 0 / 1)
  • 3 tables with 2 row holding NULL and (NULL / 0 / 1)

Test setup

-- no rows
CREATE TABLE t_empty (i int);
-- INSERT nothing

CREATE TABLE t_0 (i int);
CREATE TABLE t_1 (i int);
CREATE TABLE t_n (i int);

-- 1 row
INSERT INTO t_0 VALUES (0);
INSERT INTO t_1 VALUES (1);
INSERT INTO t_n VALUES (NULL);

CREATE TABLE t_0n (i int);
CREATE TABLE t_1n (i int);
CREATE TABLE t_nn (i int);

-- 2 rows
INSERT INTO t_0n VALUES (0),    (NULL);
INSERT INTO t_1n VALUES (1),    (NULL);
INSERT INTO t_nn VALUES (NULL), (NULL);

Query

SELECT 't_empty'           AS tbl
      ,count(*)            AS ct_all
      ,count(i)            AS ct_i
      ,sum(i)              AS simple_sum
      ,sum(COALESCE(i, 0)) AS inner_coalesce
      ,COALESCE(sum(i), 0) AS outer_coalesce
FROM   t_empty

UNION ALL
SELECT 't_0',  count(*), count(i)
      ,sum(i), sum(COALESCE(i, 0)), COALESCE(sum(i), 0) FROM t_0
UNION ALL
SELECT 't_1',  count(*), count(i)
      ,sum(i), sum(COALESCE(i, 0)), COALESCE(sum(i), 0) FROM t_1
UNION ALL
SELECT 't_n',  count(*), count(i)
      ,sum(i), sum(COALESCE(i, 0)), COALESCE(sum(i), 0) FROM t_n

UNION ALL
SELECT 't_0n', count(*), count(i)
      ,sum(i), sum(COALESCE(i, 0)), COALESCE(sum(i), 0) FROM t_0n
UNION ALL
SELECT 't_1n', count(*), count(i)
      ,sum(i), sum(COALESCE(i, 0)), COALESCE(sum(i), 0) FROM t_1n
UNION ALL
SELECT 't_nn', count(*), count(i)
      ,sum(i), sum(COALESCE(i, 0)), COALESCE(sum(i), 0) FROM t_nn;

Result

   tbl   | ct_all | ct_i | simple_sum | inner_coalesce | outer_coalesce
---------+--------+------+------------+----------------+----------------
 t_empty |      0 |    0 |     <NULL> |         <NULL> |              0
 t_0     |      1 |    1 |          0 |              0 |              0
 t_1     |      1 |    1 |          1 |              1 |              1
 t_n     |      1 |    0 |     <NULL> |              0 |              0
 t_0n    |      2 |    1 |          0 |              0 |              0
 t_1n    |      2 |    1 |          1 |              1 |              1
 t_nn    |      2 |    0 |     <NULL> |              0 |              0

-> SQLfiddle

Ergo, my initial advice was sloppy. You may need COALESCE with sum().
But if you do, use an outer COALESCE. The the inner COALESCE in your original query doesn't cover all corner cases and is rarely useful.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top