Postgres-怎么回行0算缺失的数据？

https://stackoverflow.com/questions/346132

19-08-2019
|

题

我不均匀分布的数据(应用程序的日期)几年(2003-2008年).我要查询的数据对于给定的开始和结束日期、分组的数据通过任何支持间隔(天、星期、月份、季度、年)在PostgreSQL8.3(http://www.postgresql.org/docs/8.3/static/functions-datetime.html#FUNCTIONS-DATETIME-TRUNC).

问题是，一些查询得到的结果连续超过需要的时期，作为这一：

select to_char(date_trunc('month',date), 'YYYY-MM-DD'),count(distinct post_id) 
from some_table where category_id=1 and entity_id = 77  and entity2_id = 115 
and date <= '2008-12-06' and date >= '2007-12-01' group by 
date_trunc('month',date) order by date_trunc('month',date);
          to_char   | count 
        ------------+-------
         2007-12-01 |    64
         2008-01-01 |    31
         2008-02-01 |    14
         2008-03-01 |    21
         2008-04-01 |    28
         2008-05-01 |    44
         2008-06-01 |   100
         2008-07-01 |    72
         2008-08-01 |    91
         2008-09-01 |    92
         2008-10-01 |    79
         2008-11-01 |    65
        (12 rows)

但是他们中的一些错过一些的时间间隔，因为没有数据存在，因为这一：

select to_char(date_trunc('month',date), 'YYYY-MM-DD'),count(distinct post_id) 
from some_table where category_id=1 and entity_id = 75  and entity2_id = 115 
and date <= '2008-12-06' and date >= '2007-12-01' group by 
date_trunc('month',date) order by date_trunc('month',date);

        to_char   | count 
    ------------+-------

     2007-12-01 |     2
     2008-01-01 |     2
     2008-03-01 |     1
     2008-04-01 |     2
     2008-06-01 |     1
     2008-08-01 |     3
     2008-10-01 |     2
    (7 rows)

在那里所需要的结果是：

  to_char   | count 
------------+-------
 2007-12-01 |     2
 2008-01-01 |     2
 2008-02-01 |     0
 2008-03-01 |     1
 2008-04-01 |     2
 2008-05-01 |     0
 2008-06-01 |     1
 2008-07-01 |     0
 2008-08-01 |     3
 2008-09-01 |     0
 2008-10-01 |     2
 2008-11-01 |     0
(12 rows)

一数到0缺少的条目。

我已经看到前面的讨论堆溢出，但他们没解决我的问题看来，由于我的期间分组之一的(日、星期、月份、季度、年)，并决定在运行时的应用程序。这样一种做法如左加入日历表或序列表不会帮助我的猜测。

我目前的解决方案，这是为了填补这些空白Python(在Turbogears应用程序)采用日历模块。

是否有更好的方式来做到这一点。

解决方案

您可以用

创建的最后一年（说）的所有第一天列表

select distinct date_trunc('month', (current_date - offs)) as date 
from generate_series(0,365,28) as offs;
          date
------------------------
 2007-12-01 00:00:00+01
 2008-01-01 00:00:00+01
 2008-02-01 00:00:00+01
 2008-03-01 00:00:00+01
 2008-04-01 00:00:00+02
 2008-05-01 00:00:00+02
 2008-06-01 00:00:00+02
 2008-07-01 00:00:00+02
 2008-08-01 00:00:00+02
 2008-09-01 00:00:00+02
 2008-10-01 00:00:00+02
 2008-11-01 00:00:00+01
 2008-12-01 00:00:00+01

然后你可以用一系列的加入。

其他提示

^{这个问题是老的。但是，由于其他用户选作主对于一个新的重复的，我加入一个适当的答案。}

适当的解决方案

SELECT *
FROM  (
   SELECT day::date
   FROM   generate_series(timestamp '2007-12-01'
                        , timestamp '2008-12-01'
                        , interval  '1 month') day
   ) d
LEFT   JOIN (
   SELECT date_trunc('month', date_col)::date AS day
        , count(*) AS some_count
   FROM   tbl
   WHERE  date_col >= date '2007-12-01'
   AND    date_col <= date '2008-12-06'
-- AND    ... more conditions
   GROUP  BY 1
   ) t USING (day)
ORDER  BY day;

使用 LEFT JOIN, 当然.
generate_series() 可以产生一个表格的时间戳，速度非常快。
它通常更快的聚集之前你加入。我最近提供了一个测试案例在sqlfiddle.com 在这个相关的答案：
- PostgreSQL-以通过一阵
铸的 timestamp 要 date (::date)的一个基本的格式。对更多的使用 to_char().
GROUP BY 1 是语法的简要参考的第一个输出柱。可能是 GROUP BY day 为好，但是，这可能会冲突与现有列名称相同。或 GROUP BY date_trunc('month', date_col)::date 但这太长对我的口味。
工作与可用的时间间隔参数 date_trunc().
count() 永远不会产生 NULL (0 不行)，但是 LEFT JOIN 不。
返回 0 而不是的 NULL 在外 SELECT, 使用 COALESCE(some_count, 0) AS some_count. 该手册。
对于一个 更通用的解决或任意的时间间隔 考虑这个密切相关的答案：
- 最好的方法来计算的记录，任意的时间间隔在轨+Postgres

您可以在运行时创建一个临时表和左的连接上。这似乎最有意义。

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow