Optimization: insert only not existing rows with additional conditions
-
05-10-2020 - |
문제
I have two databases:
'target' database looks like
CREATE TABLE parent (
erow_id integer PRIMARY KEY,
-- these two columns is a composite key (uid)
uid_p1 integer,
uid_p2 integer,
);
CREATE TABLE child (
erow_id integer PRIMARY KEY,
parent integer, -- pointed to parent.erow_id
value text,
vtype integer
);
-- only this one index is presented
CREATE INDEX idx_child_parent ON child (parent);
'patch' (it's attached database) looks like
CREATE TABLE source (
value text,
-- composite uid pointed to target.parent
uid_p1 int,
uid_p2 int
);
Both databases are 'fixed' and it's really difficult to modify'em, even add indexes.
I need to insert new rows (values) to child
from patch
only if parent
has no rows of type = 100 (for example).
At this moment I use such an ugly query:
INSERT INTO target.child (value, parent, vtype)
SELECT
p1.value, target.parent.erow_id, 100
FROM
patch.source p1
INNER JOIN target.parent
ON (target.parent.uid_p1 = p1.uid_p1
AND target.parent.uid_p2 = p1.uid_p2)
WHERE NOT EXISTS (
SELECT
1
FROM
patch.source p2,
target.child,
target.parent
WHERE
(p1.rowid = p2.rowid) AND
(target.child.vtype = 100) AND
(target.child.parent = target.parent.erow_id) AND
(target.parent.uid_p1 = p2.uid_p1) AND
(target.parent.uid_p2 = p2.uid_p2)
);
EXPLAIN QUERY PLAN
:
SCAN TABLE patch.source AS p1
EXECUTE CORRELATED SCALAR SUBQUERY
SEARCH TABLE patch.source AS p2 USING INTEGER PRIMARY KEY (rowid=?)
SCAN TABLE target.child
// looks like a bottleneckSEARCH TABLE target.parent USING INTEGER PRIMARY KEY (rowid=?)
SEARCH TABLE target.parent USING AUTOMATIC COVERING INDEX (uid_p1=? AND uid_p2=?)
Is it possible to optimize this incredibly slow query w/o new indexes? It's a big problem - I cannot modify at least target
db at this moment.
Thank you.
해결책
Besides adding indexes (which you say is not allowed), the query is unnecessarily complex. The 2 of the 3 table references in the correlated subquery are not needed as they join to the same tables in the main query and on primary keys. You can simplify it to:
INSERT INTO target.child (value, parent, vtype)
SELECT
p1.value, p.erow_id, 100
FROM
patch.source AS p1
INNER JOIN target.parent AS p
ON p.uid_p1 = p1.uid_p1
AND p.uid_p2 = p1.uid_p2
WHERE NOT EXISTS (
SELECT
1
FROM
target.child AS c
WHERE
c.vtype = 100 AND
c.parent = p.erow_id
);
An index on child (vtype, parent)
would help I think for performance. If that index is UNIQUE
, the query can be simplified further using OR IGNORE
clause:
INSERT OR IGNORE
INTO target.child (value, parent, vtype)
SELECT
p1.value, p.erow_id, 100
FROM
patch.source AS p1
INNER JOIN target.parent AS p
ON p.uid_p1 = p1.uid_p1
AND p.uid_p2 = p1.uid_p2 ;
What OR IGNORE
does:
IGNORE
When an applicable constraint violation occurs, the
IGNORE
resolution algorithm skips the one row that contains the constraint violation and continues processing subsequent rows of the SQL statement as if nothing went wrong. Other rows before and after the row that contained the constraint violation are inserted or updated normally. No error is returned when theIGNORE
conflict resolution algorithm is used.