Optimization: insert only not existing rows with additional conditions

https://dba.stackexchange.com/questions/162551

05-10-2020
|

문제

I have two databases:

'target' database looks like

CREATE TABLE parent (
  erow_id  integer PRIMARY KEY,
  -- these two columns is a composite key (uid)
  uid_p1   integer,
  uid_p2   integer,
);

CREATE TABLE child (
  erow_id  integer PRIMARY KEY,
  parent   integer, -- pointed to parent.erow_id
  value    text,
  vtype    integer
);

-- only this one index is presented
CREATE INDEX idx_child_parent ON child (parent);

'patch' (it's attached database) looks like

CREATE TABLE source (
  value   text,
  -- composite uid pointed to target.parent
  uid_p1  int,  
  uid_p2  int   
);

Both databases are 'fixed' and it's really difficult to modify'em, even add indexes.

I need to insert new rows (values) to child from patch only if parent has no rows of type = 100 (for example).

At this moment I use such an ugly query:

INSERT INTO target.child (value, parent, vtype)
  SELECT 
    p1.value, target.parent.erow_id, 100
  FROM
    patch.source p1
    INNER JOIN target.parent 
      ON (target.parent.uid_p1 = p1.uid_p1 
      AND target.parent.uid_p2 = p1.uid_p2)
  WHERE NOT EXISTS (
    SELECT  
      1
    FROM
      patch.source p2, 
      target.child,
      target.parent
    WHERE 
      (p1.rowid = p2.rowid) AND
      (target.child.vtype = 100) AND
      (target.child.parent = target.parent.erow_id) AND
      (target.parent.uid_p1 = p2.uid_p1) AND
      (target.parent.uid_p2 = p2.uid_p2)
  );

EXPLAIN QUERY PLAN:

SCAN TABLE patch.source AS p1
EXECUTE CORRELATED SCALAR SUBQUERY
SEARCH TABLE patch.source AS p2 USING INTEGER PRIMARY KEY (rowid=?)
SCAN TABLE target.child // looks like a bottleneck
SEARCH TABLE target.parent USING INTEGER PRIMARY KEY (rowid=?)
SEARCH TABLE target.parent USING AUTOMATIC COVERING INDEX (uid_p1=? AND uid_p2=?)

Is it possible to optimize this incredibly slow query w/o new indexes? It's a big problem - I cannot modify at least target db at this moment.

Thank you.

해결책

Besides adding indexes (which you say is not allowed), the query is unnecessarily complex. The 2 of the 3 table references in the correlated subquery are not needed as they join to the same tables in the main query and on primary keys. You can simplify it to:

INSERT INTO target.child (value, parent, vtype)
  SELECT 
    p1.value, p.erow_id, 100
  FROM
    patch.source AS p1
    INNER JOIN target.parent AS p
      ON  p.uid_p1 = p1.uid_p1 
      AND p.uid_p2 = p1.uid_p2
  WHERE NOT EXISTS (
    SELECT  
      1
    FROM
      target.child AS c
    WHERE 
      c.vtype = 100 AND
      c.parent = p.erow_id
  );

An index on child (vtype, parent) would help I think for performance. If that index is UNIQUE, the query can be simplified further using OR IGNORE clause:

INSERT OR IGNORE
INTO target.child (value, parent, vtype)
  SELECT 
    p1.value, p.erow_id, 100
  FROM
    patch.source AS p1
    INNER JOIN target.parent AS p
      ON  p.uid_p1 = p1.uid_p1 
      AND p.uid_p2 = p1.uid_p2 ;

What OR IGNORE does:

IGNORE

When an applicable constraint violation occurs, the IGNORE resolution algorithm skips the one row that contains the constraint violation and continues processing subsequent rows of the SQL statement as if nothing went wrong. Other rows before and after the row that contained the constraint violation are inserted or updated normally. No error is returned when the IGNORE conflict resolution algorithm is used.

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 dba.stackexchange