优化大型儿童桌的日期查询:要点还是杜松子酒?
-
03-10-2019 - |
题
问题
72个儿童桌子,每个桌子都有一年索引和一个站索引,定义如下:
CREATE TABLE climate.measurement_12_013
(
-- Inherited from table climate.measurement_12_013: id bigint NOT NULL DEFAULT nextval('climate.measurement_id_seq'::regclass),
-- Inherited from table climate.measurement_12_013: station_id integer NOT NULL,
-- Inherited from table climate.measurement_12_013: taken date NOT NULL,
-- Inherited from table climate.measurement_12_013: amount numeric(8,2) NOT NULL,
-- Inherited from table climate.measurement_12_013: category_id smallint NOT NULL,
-- Inherited from table climate.measurement_12_013: flag character varying(1) NOT NULL DEFAULT ' '::character varying,
CONSTRAINT measurement_12_013_category_id_check CHECK (category_id = 7),
CONSTRAINT measurement_12_013_taken_check CHECK (date_part('month'::text, taken)::integer = 12)
)
INHERITS (climate.measurement)
CREATE INDEX measurement_12_013_s_idx
ON climate.measurement_12_013
USING btree
(station_id);
CREATE INDEX measurement_12_013_y_idx
ON climate.measurement_12_013
USING btree
(date_part('year'::text, taken));
(稍后将要添加的外键约束。)
由于完整的桌子扫描,以下查询会慢慢运行:
SELECT
count(1) AS measurements,
avg(m.amount) AS amount
FROM
climate.measurement m
WHERE
m.station_id IN (
SELECT
s.id
FROM
climate.station s,
climate.city c
WHERE
/* For one city... */
c.id = 5182 AND
/* Where stations are within an elevation range... */
s.elevation BETWEEN 0 AND 3000 AND
/* and within a specific radius... */
6371.009 * SQRT(
POW(RADIANS(c.latitude_decimal - s.latitude_decimal), 2) +
(COS(RADIANS(c.latitude_decimal + s.latitude_decimal) / 2) *
POW(RADIANS(c.longitude_decimal - s.longitude_decimal), 2))
) <= 50
) AND
/* Data before 1900 is shaky; insufficient after 2009. */
extract( YEAR FROM m.taken ) BETWEEN 1900 AND 2009 AND
/* Whittled down by category... */
m.category_id = 1 AND
/* Between the selected days and years... */
m.taken BETWEEN
/* Start date. */
(extract( YEAR FROM m.taken )||'-01-01')::date AND
/* End date. Calculated by checking to see if the end date wraps
into the next year. If it does, then add 1 to the current year.
*/
(cast(extract( YEAR FROM m.taken ) + greatest( -1 *
sign(
(extract( YEAR FROM m.taken )||'-12-31')::date -
(extract( YEAR FROM m.taken )||'-01-01')::date ), 0
) AS text)||'-12-31')::date
GROUP BY
extract( YEAR FROM m.taken )
呆滞来自查询的这一部分:
m.taken BETWEEN
/* Start date. */
(extract( YEAR FROM m.taken )||'-01-01')::date AND
/* End date. Calculated by checking to see if the end date wraps
into the next year. If it does, then add 1 to the current year.
*/
(cast(extract( YEAR FROM m.taken ) + greatest( -1 *
sign(
(extract( YEAR FROM m.taken )||'-12-31')::date -
(extract( YEAR FROM m.taken )||'-01-01')::date ), 0
) AS text)||'-12-31')::date
查询的这一部分与几天的选择相匹配。例如,如果用户想在6月1日至7月1日之间查看所有数据的数据,则以上条款仅与那些日子相匹配。如果该用途想查看12月22日至3月22日之间的数据,在所有数据的所有年份中,以上条款计算出3月22日在12月22日,因此相应地匹配日期:
目前,日期固定为1月1日至12月31日,但将被参数化,如上所述。
该计划中的桥梁架显示了10006220141.11的成本,我怀疑在天文学上巨大的一面。
测量表上有一个完整的表扫描(本身既没有数据也没有索引)。该桌子从其子桌上汇总了2.73亿行。
问题
索引日期的正确方法是什么?
我考虑过的选项:
- 杜松子酒
- 要旨
- 重写Where子句
- 单独的year_taken,month_taken和day_taken列到桌子
你怎么认为?
谢谢!
解决方案
您的问题是,您有一个条款,具体取决于日期的计算。如果数据库需要获取每一行并在知道日期是否匹配之前进行计算,则该数据库无法使用索引。
除非您将其重写为数据库具有固定范围的形式,该范围不取决于检索数据的数据,您将始终必须扫描表。
其他提示
尝试这样的事情:
create temporary table test (d date);
insert into test select '1970-01-01'::date+generate_series(1,50*365);
analyze test
create function month_day(d date) returns int as $$
select extract(month from $1)::int*100+extract(day from $1)::int $$
language sql immutable strict;
create index test_d_month_day_idx on test (month_day(d));
explain analyze select * from test
where month_day(d)>=month_day('2000-04-01')
and month_day(d)<=month_day('2000-04-05');
我认为,要在这些分区中有效地运行此操作,我会为您的应用程序提供很多关于日期范围的更聪明的信息。是否生成一个实际检查日期列表,然后让其生成一个与分区之间的联合的查询。听起来您的数据集非常静态,因此您的日期索引上的集群也可以大大提高性能。
不隶属于 StackOverflow