SQL query to find all parent records where child records are a specific set

https://dba.stackexchange.com/questions/161433

05-10-2020
|

题

An Item has many ItemDetails. An ItemDetail has fields of "type", "value" and "item_id".

I need to find all Items if and only if item has exact ItemDetails which are restricted by some changeable conditions. For example I need to find all Items with ItemDetails of (type=10, value=1000) and (type=20 and value=2000)

My first solution was like this:

select p.*
from item p 
where not exists 
    ( 
      select c.id from item_detail c
      where c.item_id=p.id
      and (c.type<>10 or c.value<>1000)
      and (c.type<>20 or c.value<>2000)
    );
-- Execution Time: 17.819 ms

But I realised that it was fetching items with only one ItemDetail(type=10, value=1000). Then I found this question and changed query like below.

select p.*
from item p 
where not exists 
    ( 
      select c.id from item_detail c
      where c.item_id=p.id
      and (c.type<>10 or c.value<>1000)
      and (c.type<>20 or c.value<>2000)
    )
and 2 = (
  select count(c.item_id) from item_detail c
  where c.item_id=p.id);
-- Execution Time: 2426.596 ms

But second subquery is causing performance problem. Execution time for first query is 5 ms, but for second it is 800 ms. Is there a better way of doing this?

I am using PostgreSQL 9.5.

Here is the fiddle for it.

Edit-1: Here is fiddle example for those who cannot reach it:

CREATE TABLE public.item
(
  id integer NOT NULL,
  name  character varying(10) NOT NULL, 
  CONSTRAINT item_pkey PRIMARY KEY (id)
);

CREATE TABLE public.item_detail
(
  id bigint NOT NULL,
  item_id integer NOT NULL,
  type  integer NOT NULL,
  value integer NOT NULL,
  CONSTRAINT item_detail_pkey PRIMARY KEY (id),
  CONSTRAINT fk_item_id FOREIGN KEY (item_id)
      REFERENCES item (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT uq_item_type_value UNIQUE (item_id, type, value)
);

INSERT INTO public.item VALUES (1, 'Item1'),(2, 'Item2'),(3, 'Item3'),(4, 'Item4'),(5, 'Item5');

INSERT INTO public.item_detail
VALUES
  (1,1,10,1000),
  (2,1,20,2000),
  (3,2,10,1000),
  (4,3,10,1000),
  (5,3,20,2000),
  (6,3,30,3000),
  (7,4,10,1000),
  (8,4,10,1500);

Edit-2: I came up with a solution like this by choosing the best option for me from this source given by ypercubeᵀᴹ.

SELECT p.* FROM item p 
WHERE EXISTS (SELECT item_id FROM item_detail 
                WHERE item_id = p.id AND (type, value) = (10, 1000))
AND   EXISTS (SELECT item_id FROM item_detail 
                WHERE item_id = p.id AND (type, value) = (20, 2000))
AND NOT EXISTS (SELECT item_id FROM item_detail 
                WHERE item_id = p.id AND (type, value) NOT IN ((10, 1000), (20,2000)));
-- Execution Time: 0.984 ms

解决方案

The problem we are trying to solve is called exact relational division. It requires an extra condition from the relational division and there are still a lot (too many actually) ways to solve this kind of problem.

Check this related SO question where you will find more than 10 different ways to do this in Postgres and performance tests. The problem discussed there is (simple, not exact) relational division so you'll have to adjust the answers:

How to filter SQL results in a has-many-through relation

Here is another one:

select p.*
from item p 
  join 
    ( select item_id from item_detail 
      where (type, value) = (10, 1000)
      intersect
      select item_id from item_detail 
      where (type, value) = (20, 2000)
      except
      select item_id from item_detail 
      where (type, value) not in ((10, 1000), (20, 2000))
    ) as c
  on c.item_id = p.id ;

I suggest an index on (type, value, item_id).

其他提示

PostgreSQL specific query

SELECT
    i.id, i.name
FROM
    item i
WHERE
    ARRAY[(10, 1000), (20, 2000)] /* must be sorted! */ 
    = 
    ARRAY(SELECT (type, value)
            FROM item_detail d 
           WHERE d.item_id = i.id
        ORDER BY type /* sorted! */)

This query relies on the fact that you can turn any SELECT into an ARRAY of ROWs, and then use the = (array equality) array operator.

This is PostgreSQL specific. Being non-standard, it can cause problems if you ever want to switch from one database to another. It's got one advantage: it's shorter to express, and probably quicker to execute. (This probably implies: try and compare in your real case).

A standard alternative:

If you want to find items that have several related details, all at the same time, you should write it by having as many EXISTS, plus one extra condition to make sure that there are no-more (and that can be a count of the related records, or it could also be a NOT EXISTS):

SELECT
    i.id, i.name
FROM
    item i
WHERE
    -- There is a (type, value) = (10, 1000) item_detail related to this item
    EXISTS 
       (SELECT * 
          FROM item_detail d 
         WHERE d.item_id = i.id AND d.type=10 AND d.value = 1000)
    AND 
    -- There is a (type, value) = (20, 2000) item_detail related to this item
    EXISTS 
       (SELECT * 
          FROM item_detail d 
         WHERE d.item_id = i.id AND d.type=20 AND d.value = 2000) 
    AND
    -- There are NO MORE item_details related (there must be 2, and only 2)
       2 = (SELECT count(*) 
              FROM item_detail d 
             WHERE d.item_id = i.id) ;

According to @Beytun comments, execution times are:

Added execution time of all queries in question and answers. For your first query it is 829.786 ms, and for the second one is 9.390 ms.

So, the PostgreSQL specific option is much much worse (about 2 orders of magnitude) than using the standard one. So, even if takes some extra typing (specially if you have a relatively large set of restrictions)... the choice is obvious.

First challenge is to find same item_id which have both combinations of fields. I think the following should work:

select p.*
from item p 
where p.id in
    ( 
      select c.item_id
      from item_detail c
        where (c.type = '10' and c.value '1000') or (c.type = '20' and c.value '2000')
        group by c.item_id
        having count(*) > 1
    );

许可以下： CC-BY-SA 和归因

不隶属于 dba.stackexchange