PostgreSQL:左连接创建空行
-
14-12-2019 - |
题
在此解释结束时看到重要的新发现1和2。
我正在运行postgres 9.1.3,我有一个奇怪的左连接问题。
我有一个名为一致的表。超过200万行。它有一个名为 citation_id 的列,该列没有空。我可以用这个验证:
SELECT COUNT(*)
FROM consistent.master
WHERE citation_id IS NULL
.
返回 0 。
这是奇怪的地方:如果我左加入这个表到临时表,我会得到一个错误,我试图将null插入 citation_id 字段:
错误:列中的null值“citation_id”违反not-null约束 SQL状态:23502
这是查询:
WITH stops AS (
SELECT citation_id,
rank() OVER (ORDER BY offense_timestamp,
defendant_dl,
offense_street_number,
offense_street_name) AS stop
FROM consistent.master
WHERE citing_jurisdiction=1
)
INSERT INTO consistent.masternew (arrest_id, citation_id, defendant_dl, defendant_dl_state, defendant_zip, defendant_race, defendant_sex, defendant_dob, vehicle_licenseplate, vehicle_licenseplate_state, vehicle_registration_expiration_date, vehicle_year, vehicle_make, vehicle_model, vehicle_color, offense_timestamp, offense_street_number, offense_street_name, offense_crossstreet_number, offense_crossstreet_name, offense_county, officer_id, offense_code, speed_alleged, speed_limit, work_zone, school_zone, offense_location, id, source, citing_jurisdiction, the_geom)
SELECT stops.stop, master.citation_id, defendant_dl, defendant_dl_state, defendant_zip, defendant_race, defendant_sex, defendant_dob, vehicle_licenseplate, vehicle_licenseplate_state, vehicle_registration_expiration_date, vehicle_year, vehicle_make, vehicle_model, vehicle_color, offense_timestamp, offense_street_number, offense_street_name, offense_crossstreet_number, offense_crossstreet_name, offense_county, officer_id, offense_code, speed_alleged, speed_limit, work_zone, school_zone, offense_location, id, source, citing_jurisdiction, the_geom
FROM consistent.master LEFT JOIN stops
ON stops.citation_id = master.citation_id
.
我正在抓住我的脑袋。如果这是一个左加入,如果一致。当没有任何开始的时候?
这是我用于创建表的SQL代码:
CREATE TABLE consistent.masternew
(
arrest_id character varying(20),
citation_id character varying(20) NOT NULL,
defendant_dl character varying(20),
defendant_dl_state character varying(2),
defendant_zip character varying(9),
defendant_race character varying(10),
defendant_sex character(1),
defendant_dob date,
vehicle_licenseplate character varying(10),
vehicle_licenseplate_state character(2),
vehicle_registration_expiration_date date,
vehicle_year integer,
vehicle_make character varying(20),
vehicle_model character varying(20),
vehicle_color character varying,
offense_timestamp timestamp without time zone,
offense_street_number character varying(10),
offense_street_name character varying(30),
offense_crossstreet_number character varying(10),
offense_crossstreet_name character varying(30),
offense_county character varying(10),
officer_id character varying(20),
offense_code integer,
speed_alleged integer,
speed_limit integer,
work_zone bit(1),
school_zone bit(1),
offense_location point,
id serial NOT NULL,
source character varying(20), -- Where this citation came from--court, PD, etc.
citing_jurisdiction integer,
the_geom geometry,
CONSTRAINT masternew_pkey PRIMARY KEY (id ),
CONSTRAINT citing_jurisdiction FOREIGN KEY (citing_jurisdiction)
REFERENCES consistent.jurisdictions (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION,
CONSTRAINT offenses FOREIGN KEY (offense_code)
REFERENCES consistent.offenses (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION,
CONSTRAINT enforce_dims_the_geom CHECK (st_ndims(the_geom) = 2),
CONSTRAINT enforce_geotype_the_geom CHECK (geometrytype(the_geom) = 'POINT'::text OR the_geom IS NULL),
CONSTRAINT enforce_srid_the_geom CHECK (st_srid(the_geom) = 3081)
)
WITH (
OIDS=FALSE
);
ALTER TABLE consistent.masternew
OWNER TO postgres;
COMMENT ON COLUMN consistent.masternew.source IS 'Where this citation came from--court, PD, etc.';
CREATE INDEX masternew_citation_id_idx
ON consistent.masternew
USING btree
(citation_id COLLATE pg_catalog."default" );
CREATE INDEX masternew_citing_jurisdiction_idx
ON consistent.masternew
USING btree
(citing_jurisdiction );
CREATE INDEX masternew_defendant_dl_idx
ON consistent.masternew
USING btree
(defendant_dl COLLATE pg_catalog."default" );
CREATE INDEX masternew_id_idx
ON consistent.masternew
USING btree
(id );
CREATE INDEX masternew_offense_street_name_idx
ON consistent.masternew
USING btree
(offense_street_name COLLATE pg_catalog."default" );
CREATE INDEX masternew_offense_street_number_idx
ON consistent.masternew
USING btree
(offense_street_number COLLATE pg_catalog."default" );
CREATE INDEX masternew_offense_timestamp_idx
ON consistent.masternew
USING btree
(offense_timestamp );
CREATE INDEX masternew_the_geom_idx
ON consistent.masternew
USING gist
(the_geom );
.
重要发现1
我只是发现了有趣的东西。此查询:
SELECT COUNT(*)
FROM consistent.master
WHERE citation_id IS NOT NULL
UNION
SELECT COUNT(*)
FROM consistent.master
UNION
SELECT COUNT(*)
FROM consistent.master
WHERE citation_id IS NULL
.
结果是:
2085344
2085343
0
.
如何解释一下?如何使用WHERE citation_id IS NOT NULL
的计数可能高于没有生成的查询,没有生长icetagcode子句?
重要发现2 好的,根据下面的评论,我发现我有一个带有所有空值的行,而且它尽管表明具有串行生成频率列和一些生成的世代odicetagcode约束的事实。
我删除了bum行。现在我没有得到空错误。相反,我得到了这个:
ERROR: duplicate key value violates unique constraint "masternew_pkey"
DETAIL: Key (id)=(1583804) already exists.
********** Error **********
ERROR: duplicate key value violates unique constraint "masternew_pkey"
SQL state: 23505
Detail: Key (id)=(1583804) already exists.
.
所以只是为了确保,我做这个查询:
SELECT COUNT(id)
FROM consistent.master
WHERE id=1583804;
.
猜猜是什么? WHERE
只有1个实例!所以给出了id
中的左表中的一个实例 1583804 在NOT NULL
和中, id 列只能来自左表,怎么可能发生这种错误?像这样的生成古代码代表不应该导致最终结果比左表更有更多的行,右?
解决方案
使用插入件,尤其是复杂的,您应该始终定义目标列。所以:
INSERT INTO consistent.masternew (citation_id, col1, col2, ...).
如果伴随的SELECT语句中有任何问题 - 如此:
the_geom geometry
.
(没有意义重命名类型名称 - 我假设这是不安的) - 或者如果底层表定义更改,则没有定义目标列的插入语句可以非常错误。
PostgreSQL在“目标”表中的SELECT语句中不强制执行相同数量的列。我引用细致的手册:
显式或隐式列列表中不存在的每列将 填充默认值,其已声明的默认值或 如果没有,则为null。
(粗体强调我的。)如果列列表中有一个不匹配,这可能会使空值显示出“无处不在”。
此外,SELECT语句中列表的顺序必须匹配要插入的列的顺序。如果没有拼写目标列,这将是您在创建时表中列的顺序。您似乎希望列自动匹配名称,但这不是那么。 SELECT语句中的列名与插入件的最终步骤完全无关。只有从左到右的命令都是重要的。
违背其他人暗示了 with子句是完全合法的。我引用插入手册:
可以使用查询(SELECT语句)来包含一个 条款。在这种情况下,可以引用两组with _query 在查询中,但第二个是更优先的 紧密嵌套。
您的语句可以如下所示:
WITH stops AS (
SELECT citation_id
,rank() OVER (ORDER BY
offense_timestamp
,defendant_dl
,offense_street_number
,offense_street_name) AS stop
FROM consistent.master
WHERE citing_jurisdiction = 1
)
INSERT INTO consistent.masternew (citation_id, col1, col2, ...) -- add columns
SELECT m.citation_id -- order colums accordingly!
,s.stop
,m.defendant_dl
-- 27 more columns
,m.citing_jurisdiction
,m.the_geom
FROM consistent.master m
LEFT JOIN stops s USING (citation_id);
. 其他提示
猜测,我会说你正在插入stops.stop,它可以为null,进入citation_id列,但不知道表结构我不能肯定地说:)
编辑:尝试@vol7ron的建议并命名列...