How does SQL Server process DELETE WHERE EXISTS (SELECT 1 FROM TABLE)?

https://dba.stackexchange.com/questions/282993

13-03-2021
|

Pergunta

The below is a valid TSQL statement. But I want to understand how SQL Server processes it.

DELETE A
FROM table1 WITH (NOLOCK)
WHERE
    EXISTS
    (
    SELECT 1
    FROM
        table2 B WITH (NOLOCK)
    WHERE
        B.id = A.id
    )

Because the output of the subquery will be a list of 1s. How does SQL Server know which rows to delete?

Solução

How does SQL Server know which rows to delete?

To understand how it's processed, it's probably more helpful to look at the execution plan for the query. Setup scripts are at the end.

First, let's change the query a little bit to something that should throw an error, but doesn't.

DELETE A
FROM table1 AS A
WHERE EXISTS
(
    SELECT 1/0
    FROM table2 B
    WHERE B.id = A.id
);

If you were to just run SELECT 1/0 you'd get a divide by zero error. But there are places where expressions are present in the query for parsing, but aren't actually projected by the optimizer.

The "list of ones" you're referring to doesn't ever really appear. All the work to identify rows to delete is done via the where clause in the exists.

Reading the plan from right to left is the correct method when you need to understand the flow of data. Left to right is the flow of logic.

We have:

A full scan of both tables
A hash join to match rows
A delete from table1

The hash join figures out which rows need to go, and passes qualifying bookmark values on to the delete operator. In this case, bookmarks are used to uniquely identify rows because there's no clustered index available to use keys from.

With a unique clustered index, key values alone can be used to identify rows. With a non-unique clustered index, an additional unique-ifier will accompany any duplicate values, though this won't be visible to you.

Setup scripts

DROP TABLE IF EXISTS table1, table2;

CREATE TABLE dbo.table1 (id int NOT NULL);
CREATE TABLE dbo.table2 (id int NOT NULL);

INSERT 
    dbo.table1 WITH(TABLOCK) (id) 
SELECT 
    x.*
FROM 
(
    SELECT 
        ROW_NUMBER() OVER(ORDER BY 1/0) AS n 
    FROM sys.messages AS m
) AS x;

INSERT 
    dbo.table2 WITH(TABLOCK) (id)
SELECT 
    t.*
FROM dbo.table1 AS t
WHERE id % 2 = 0;

SELECT TOP (100) t1.* FROM dbo.table1 AS t1;
SELECT TOP (100) t2.* FROM dbo.table2 AS t2;

Outras dicas

the server goes through every row of table1 and checks, if there is a corresponding id in table2 and deletes it , if that is the case..

The query you have has a semi-join,

DELETE A FROM table1 WHERE EXISTS ( SELECT 1 FROM table2 B WITH WHERE B.id = A.id )

Maybe you would have written it as:

DELETE A
FROM table1
inner join table2 B on B.id = A.id

A semi-join can be more efficient. You can look up what a semi-join is here and here, for performance check out this article.

Licenciado em: CC-BY-SA com atribuição

Não afiliado a dba.stackexchange