SQL query to find all parent records where child records are a specific set
-
05-10-2020 - |
题
An Item has many ItemDetails. An ItemDetail has fields of "type", "value" and "item_id".
I need to find all Items if and only if item has exact ItemDetails which are restricted by some changeable conditions. For example I need to find all Items with ItemDetails of (type=10, value=1000) and (type=20 and value=2000)
My first solution was like this:
select p.*
from item p
where not exists
(
select c.id from item_detail c
where c.item_id=p.id
and (c.type<>10 or c.value<>1000)
and (c.type<>20 or c.value<>2000)
);
-- Execution Time: 17.819 ms
But I realised that it was fetching items with only one ItemDetail(type=10, value=1000). Then I found this question and changed query like below.
select p.*
from item p
where not exists
(
select c.id from item_detail c
where c.item_id=p.id
and (c.type<>10 or c.value<>1000)
and (c.type<>20 or c.value<>2000)
)
and 2 = (
select count(c.item_id) from item_detail c
where c.item_id=p.id);
-- Execution Time: 2426.596 ms
But second subquery is causing performance problem. Execution time for first query is 5 ms, but for second it is 800 ms. Is there a better way of doing this?
I am using PostgreSQL 9.5.
Here is the fiddle for it.
Edit-1: Here is fiddle example for those who cannot reach it:
CREATE TABLE public.item
(
id integer NOT NULL,
name character varying(10) NOT NULL,
CONSTRAINT item_pkey PRIMARY KEY (id)
);
CREATE TABLE public.item_detail
(
id bigint NOT NULL,
item_id integer NOT NULL,
type integer NOT NULL,
value integer NOT NULL,
CONSTRAINT item_detail_pkey PRIMARY KEY (id),
CONSTRAINT fk_item_id FOREIGN KEY (item_id)
REFERENCES item (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION,
CONSTRAINT uq_item_type_value UNIQUE (item_id, type, value)
);
INSERT INTO public.item VALUES (1, 'Item1'),(2, 'Item2'),(3, 'Item3'),(4, 'Item4'),(5, 'Item5');
INSERT INTO public.item_detail
VALUES
(1,1,10,1000),
(2,1,20,2000),
(3,2,10,1000),
(4,3,10,1000),
(5,3,20,2000),
(6,3,30,3000),
(7,4,10,1000),
(8,4,10,1500);
Edit-2: I came up with a solution like this by choosing the best option for me from this source given by ypercubeᵀᴹ.
SELECT p.* FROM item p
WHERE EXISTS (SELECT item_id FROM item_detail
WHERE item_id = p.id AND (type, value) = (10, 1000))
AND EXISTS (SELECT item_id FROM item_detail
WHERE item_id = p.id AND (type, value) = (20, 2000))
AND NOT EXISTS (SELECT item_id FROM item_detail
WHERE item_id = p.id AND (type, value) NOT IN ((10, 1000), (20,2000)));
-- Execution Time: 0.984 ms
解决方案
The problem we are trying to solve is called exact relational division. It requires an extra condition from the relational division and there are still a lot (too many actually) ways to solve this kind of problem.
Check this related SO question where you will find more than 10 different ways to do this in Postgres and performance tests. The problem discussed there is (simple, not exact) relational division so you'll have to adjust the answers:
How to filter SQL results in a has-many-through relation
Here is another one:
select p.*
from item p
join
( select item_id from item_detail
where (type, value) = (10, 1000)
intersect
select item_id from item_detail
where (type, value) = (20, 2000)
except
select item_id from item_detail
where (type, value) not in ((10, 1000), (20, 2000))
) as c
on c.item_id = p.id ;
I suggest an index on (type, value, item_id)
.
其他提示
PostgreSQL specific query
SELECT
i.id, i.name
FROM
item i
WHERE
ARRAY[(10, 1000), (20, 2000)] /* must be sorted! */
=
ARRAY(SELECT (type, value)
FROM item_detail d
WHERE d.item_id = i.id
ORDER BY type /* sorted! */)
This query relies on the fact that you can turn any SELECT
into an ARRAY
of ROW
s, and then use the =
(array equality) array operator.
This is PostgreSQL specific. Being non-standard, it can cause problems if you ever want to switch from one database to another. It's got one advantage: it's shorter to express, and probably quicker to execute. (This probably implies: try and compare in your real case).
A standard alternative:
If you want to find item
s that have several related details, all at the same time, you should write it by having as many EXISTS
, plus one extra condition to make sure that there are no-more (and that can be a count of the related records, or it could also be a NOT EXISTS):
SELECT
i.id, i.name
FROM
item i
WHERE
-- There is a (type, value) = (10, 1000) item_detail related to this item
EXISTS
(SELECT *
FROM item_detail d
WHERE d.item_id = i.id AND d.type=10 AND d.value = 1000)
AND
-- There is a (type, value) = (20, 2000) item_detail related to this item
EXISTS
(SELECT *
FROM item_detail d
WHERE d.item_id = i.id AND d.type=20 AND d.value = 2000)
AND
-- There are NO MORE item_details related (there must be 2, and only 2)
2 = (SELECT count(*)
FROM item_detail d
WHERE d.item_id = i.id) ;
According to @Beytun comments, execution times are:
Added execution time of all queries in question and answers. For your first query it is 829.786 ms, and for the second one is 9.390 ms.
So, the PostgreSQL specific option is much much worse (about 2 orders of magnitude) than using the standard one. So, even if takes some extra typing (specially if you have a relatively large set of restrictions)... the choice is obvious.
First challenge is to find same item_id which have both combinations of fields. I think the following should work:
select p.* from item p where p.id in ( select c.item_id from item_detail c where (c.type = '10' and c.value '1000') or (c.type = '20' and c.value '2000') group by c.item_id having count(*) > 1 );