Question

I have some table called classification that contains classification_indicator_id.
I need to sum this ID and put in 1 day series.
I need to add around 20 columns (with another classification_indicator_id).
I modified a bit answer from previous question:

select
data.d::date as "data",
sum(c.classification_indicator_id)::integer as "Segment1",
sum(c4.classification_indicator_id)::integer as "Segment2",
sum(c5.classification_indicator_id)::integer as "Segment3"
from 
  generate_series(
    '2013-03-25'::timestamp without time zone,
    '2013-04-01'::timestamp without time zone,
    '1 day'::interval
) data(d)
left join classifications c on (data.d::date = c.created::date and c.classification_indicator_id = 3)
left join classifications c4 on (data.d::date = c4.created::date and c4.classification_indicator_id = 4)
left join classifications c5 on (data.d::date = c5.created::date and c5.classification_indicator_id = 5)
group by "data"
ORDER BY "data"

But still not working properly. sum for each row is to big, and growing when I add additional columns. In second table with 4 columns in segment1 for 2013-03-26 should be the same amount like in first table etc.

 With 3 column                      With 4 columns
data       | Segment1 | Segment2   data       | Segment1 | Segment2 | Segment3
--------------------------------   -------------------------------------------
2013-03-25 | 12       | 16         2013-03-25 | 12       | 16       | 20
--------------------------------   -------------------------------------------
2013-03-26 | 18       | 24         2013-03-26 | 108      | 144      | 180    
Was it helpful?

Solution

As commented under your previous answer, you are running into a "proxy cross join".
I explained it in more detail in this related answer:
Two SQL LEFT JOINS produce incorrect result

Your query should work like this:

SELECT d.created AS data
      ,c3.segment1
      ,c4.segment2
      ,c5.segment3
FROM (
   SELECT generate_series('2013-03-25'::date
                         ,'2013-04-01'::date
                         ,interval '1 day')::date AS created
    ) d
LEFT JOIN (
    SELECT created
          ,sum(classification_indicator_id)::integer AS segment1
    FROM   classifications
    WHERE  classification_indicator_id = 3
    GROUP  BY 1
    ) c3 USING (created)
LEFT JOIN (
    SELECT created
          ,sum(classification_indicator_id)::integer AS segment2
    FROM   classifications
    WHERE  classification_indicator_id = 4
    GROUP  BY 1
    ) c4 USING (created)
LEFT JOIN (
    SELECT created
          ,sum(classification_indicator_id)::integer AS segment3
    FROM   classifications
    WHERE  classification_indicator_id = 5
    GROUP  BY 1
    ) c5 USING (created)
ORDER  BY 1;

Assuming that created is a date, not a timestamp.

Or, for an even faster query, since this has become a topic:

SELECT d.created AS data
      ,count(classification_indicator_id = 3 OR NULL)::int * 3 AS segment1
      ,count(classification_indicator_id = 4 OR NULL)::int * 4 AS segment2
      ,count(classification_indicator_id = 5 OR NULL)::int * 5 AS segment3
FROM (
   SELECT generate_series('2013-03-25'::date
                         ,'2013-04-01'::date
                         ,interval '1 day')::date AS created
    ) d
LEFT   JOIN classifications c USING (created)
GROUP  BY 1
ORDER  BY 1;

OTHER TIPS

No need for joins:

select
    data.d::date as "data",
    sum((classification_indicator_id = 3)::integer * classification_indicator_id)::integer as "Segment1",
    sum((classification_indicator_id = 4)::integer * classification_indicator_id)::integer as "Segment2",
    sum((classification_indicator_id = 5)::integer * classification_indicator_id)::integer as "Segment3",
from 
    generate_series(
        '2013-03-25'::timestamp without time zone,
        '2013-04-01'::timestamp without time zone,
        '1 day'::interval
    ) data(d)
    left join
    classifications c on data.d::date = c.created::date
group by "data"
ORDER BY "data"
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top