How does SQL Server process DELETE WHERE EXISTS (SELECT 1 FROM TABLE)?
-
13-03-2021 - |
Pergunta
The below is a valid TSQL statement. But I want to understand how SQL Server processes it.
DELETE A
FROM table1 WITH (NOLOCK)
WHERE
EXISTS
(
SELECT 1
FROM
table2 B WITH (NOLOCK)
WHERE
B.id = A.id
)
Because the output of the subquery will be a list of 1s. How does SQL Server know which rows to delete?
Solução
How does SQL Server know which rows to delete?
To understand how it's processed, it's probably more helpful to look at the execution plan for the query. Setup scripts are at the end.
First, let's change the query a little bit to something that should throw an error, but doesn't.
DELETE A
FROM table1 AS A
WHERE EXISTS
(
SELECT 1/0
FROM table2 B
WHERE B.id = A.id
);
If you were to just run SELECT 1/0
you'd get a divide by zero error. But there are places where expressions are present in the query for parsing, but aren't actually projected by the optimizer.
The "list of ones" you're referring to doesn't ever really appear. All the work to identify rows to delete is done via the where clause in the exists.
Reading the plan from right to left is the correct method when you need to understand the flow of data. Left to right is the flow of logic.
We have:
- A full scan of both tables
- A hash join to match rows
- A delete from
table1
The hash join figures out which rows need to go, and passes qualifying bookmark values on to the delete operator. In this case, bookmarks are used to uniquely identify rows because there's no clustered index available to use keys from.
With a unique clustered index, key values alone can be used to identify rows. With a non-unique clustered index, an additional unique-ifier will accompany any duplicate values, though this won't be visible to you.
Setup scripts
DROP TABLE IF EXISTS table1, table2;
CREATE TABLE dbo.table1 (id int NOT NULL);
CREATE TABLE dbo.table2 (id int NOT NULL);
INSERT
dbo.table1 WITH(TABLOCK) (id)
SELECT
x.*
FROM
(
SELECT
ROW_NUMBER() OVER(ORDER BY 1/0) AS n
FROM sys.messages AS m
) AS x;
INSERT
dbo.table2 WITH(TABLOCK) (id)
SELECT
t.*
FROM dbo.table1 AS t
WHERE id % 2 = 0;
SELECT TOP (100) t1.* FROM dbo.table1 AS t1;
SELECT TOP (100) t2.* FROM dbo.table2 AS t2;
Outras dicas
the server goes through every row of table1
and checks, if there is a corresponding id in table2
and deletes it , if that is the case..
The query you have has a semi-join,
DELETE A FROM table1 WHERE EXISTS ( SELECT 1 FROM table2 B WITH WHERE B.id = A.id )
Maybe you would have written it as:
DELETE A
FROM table1
inner join table2 B on B.id = A.id
A semi-join can be more efficient. You can look up what a semi-join is here and here, for performance check out this article.