Question

I have specific cleanup code that tries to remove some duplicates.

This runs perfectly on many customer sites. The logs tell me that at least 1 sec up to 45 sec is consumed by this query:

DELETE FROM [tbl]
WHERE [Id] NOT IN
(
    SELECT MIN([Id])
    FROM [tbl]
    GROUP BY [IdProject], [IdRepresentative], [TimeStart]
) 

But I have a customer where this query runs for more than 4 hours (up to now and not ending)! I checked the DB (DBCC CHECKDB), I already update the statistics (sp_updatestats), also UPDATE STATISTICS [tbl] WITH FULLSCAN shows no change.

I have the original backup of DB from the customer. I run it on an SQL Server 14.0.2002.14. I have Standard Edition, the customer uses the Express Edition.

I can see in activity monitor that no one else is using the DB. There are no waits and the CPU is used by 25% (exactly 1 of my 4 CPUs). Also in this my test case no one else is using the DB.

I reformed the query and checked this statement:

DELETE FROM [tbl]
FROM [tbl] AS t
LEFT OUTER JOIN 
    (
        SELECT MIN([Id]) AS [IdMin]
        FROM [tbl]
        GROUP BY [IdProject], [IdRepresentative], [TimeStart]
    ) AS d ON d.[IdMin]=t.[Id]
WHERE d.[IdMin] IS NULL

This statement executes in just 1-4 seconds on the same DB.

What can I do with the table or the SQL DB to speed it up?

For me it seems to be a specific problem with DB situation/SQL Server Version. We have never seen this behavior on nearly 100 other sites.

The question is not about discussing that the second DELETE with JOIN style is better. I know this. But we have this other code currently in the production and I can't change it on the fly, but I want to make the customer happy.

Id is not nullable. It is a primary clustered ID. Creating an index is not an option. Because I can't influence the current running system. Something must be physically different.

There are definitely no locks! I just used a stand alone machine with the backup of the DB. And I just executed this single statement inside the management studio.

The Query Plan

Query Plan

Output sp_whoisactive

00 00:03:46.523;54;<?query -- DELETE FROM [tblSchedTimeline] WHERE [Id] NOT IN  (       SELECT MIN([Id])        FROM [tblSchedTimeline]         GROUP BY [IdProject], [IdRepresentative], [TimeStart]   )  --?>;DESKTOP-QV3K54L\Test;NULL;"            224,653";"                  0";"                  0";NULL;"          2,393,069";"                  0";"              1,225";"<ShowPlanXML xmlns=""http://schemas.microsoft.com/sqlserver/2004/07/showplan"" Version=""1.6"" Build=""14.0.2002.14""><BatchSequence><Batch><Statements><StmtSimple StatementText=""DELETE FROM [tblSchedTimeline]&#xD;&#xA;WHERE [Id] NOT IN&#xD;&#xA;&#x9;(&#xD;&#xA;&#x9;&#x9;SELECT MIN([Id])&#xD;&#xA;&#x9;&#x9;FROM [tblSchedTimeline]&#xD;&#xA;&#x9;&#x9;GROUP BY [IdProject], [IdRepresentative], [TimeStart]&#xD;&#xA;&#x9;)"" StatementId=""1"" StatementCompId=""1"" StatementType=""DELETE"" RetrievedFromCache=""true"" StatementSubTreeCost=""91.3449"" StatementEstRows=""257246"" SecurityPolicyApplied=""false"" StatementOptmLevel=""FULL"" QueryHash=""0x527453AF47051791"" QueryPlanHash=""0x1988C324845A2D73"" CardinalityEstimationModelVersion=""120""><StatementSetOptions QUOTED_IDENTIFIER=""true"" ARITHABORT=""true"" CONCAT_NULL_YIELDS_NULL=""true"" ANSI_NULLS=""true"" ANSI_PADDING=""true"" ANSI_WARNINGS=""true"" NUMERIC_ROUNDABORT=""false"" /><QueryPlan CachedPlanSize=""64"" CompileTime=""458"" CompileCPU=""16"" CompileMemory=""584""><MemoryGrantInfo SerialRequiredMemory=""512"" SerialDesiredMemory=""21608"" /><OptimizerHardwareDependentProperties EstimatedAvailableMemoryGrant=""104844"" EstimatedPagesCached=""26211"" EstimatedAvailableDegreeOfParallelism=""2"" MaxCompileMemory=""1414704"" /><OptimizerStatsUsage><StatisticsInfo LastUpdate=""2019-01-23T09:09:49.14"" ModificationCount=""37344"" SamplingPercent=""28.5972"" Statistics=""[PK__tblSched__3214EC076837DC08]"" Table=""[tblSchedTimeline]"" Schema=""[dbo]"" Database=""[AGVIP-KCC]"" /></OptimizerStatsUsage><RelOp NodeId=""0"" PhysicalOp=""Index Delete"" LogicalOp=""Delete"" EstimateRows=""257246"" EstimateIO=""7.9627"" EstimateCPU=""0.257246"" AvgRowSize=""9"" EstimatedTotalSubtreeCost=""91.3449"" Parallel=""0"" EstimateRebinds=""0"" EstimateRewinds=""0"" EstimatedExecutionMode=""Row""><OutputList /><Update WithOrderedPrefetch=""1"" DMLRequestSort=""1""><Object Database=""[AGVIP-KCC]"" Schema=""[dbo]"" Table=""[tblSchedTimeline]"" Index=""[IDX_SchedTimeline_Ids]"" IndexKind=""NonClustered"" Storage=""RowStore"" /><RelOp NodeId=""2"" PhysicalOp=""Sort"" LogicalOp=""Sort"" EstimateRows=""257246"" EstimateIO=""0.0112613"" EstimateCPU=""21.2216"" AvgRowSize=""27"" EstimatedTotalSubtreeCost=""83.125"" Parallel=""0"" EstimateRebinds=""0"" EstimateRewinds=""0"" EstimatedExecutionMode=""Row""><OutputList><ColumnReference Database=""[AGVIP-KCC]"" Schema=""[dbo]"" Table=""[tblSchedTimeline]"" Column=""Id"" /><ColumnReference Database=""[AGVIP-KCC]"" Schema=""[dbo]"" Table=""[tblSchedTimeline]"" Column=""IdProject"" /><ColumnReference Database=""[AGVIP-KCC]"" Schema=""[dbo]"" Table=""[tblSchedTimeline]"" Column=""IdRepresentative"" /><ColumnReference Database=""[AGVIP-KCC]"" Schema=""[dbo]"" Table=""[tblSchedTimeline]"" Column=""TimeStart"" /></OutputList><MemoryFractions Input=""1"" Output=""1"" /><Sort Distinct=""0""><OrderBy><OrderByColumn Ascending=""1""><ColumnReference Database=""[AGVIP-KCC]"" Schema=""[dbo]"" Table=""[tblSchedTimeline]"" Column=""IdProject"" /></OrderByColumn><OrderByColumn Ascending=""1""><ColumnReference Database=""[AGVIP-KCC]"" Schema=""[dbo]"" Table=""[tblSchedTimeline]"" Column=""IdRepresentative"" /></OrderByColumn><OrderByColumn Ascending=""1""><ColumnReference Database=""[AGVIP-KCC]"" Schema=""[dbo]"" Table=""[tblSchedTimeline]"" Column=""TimeStart"" /></OrderByColumn><OrderByColumn Ascending=""1""><ColumnReference Database=""[AGVIP-KCC]"" Schema=""[dbo]"" Table=""[tblSchedTimeline]"" Column=""Id"" /></OrderByColumn></OrderBy><RelOp NodeId=""3"" PhysicalOp=""Clustered Index Delete"" LogicalOp=""Delete"" EstimateRows=""257246"" EstimateIO=""30.7735"" EstimateCPU=""0.257246"" AvgRowSize=""27"" EstimatedTotalSubtreeCost=""61.8921"" Parallel=""0"" EstimateRebinds=""0"" EstimateRewinds=""0"" EstimatedExecutionMode=""Row""><OutputList><ColumnReference Database=""[AGVIP-KCC]"" Schema=""[dbo]"" Table=""[tblSchedTimeline]"" Column=""Id"" /><ColumnReference Database=""[AGVIP-KCC]"" Schema=""[dbo]"" Table=""[tblSchedTimeline]"" Column=""IdProject"" /><ColumnReference Database=""[AGVIP-KCC]"" Schema=""[dbo]"" Table=""[tblSchedTimeline]"" Column=""IdRepresentative"" /><ColumnReference Database=""[AGVIP-KCC]"" Schema=""[dbo]"" Table=""[tblSchedTimeline]"" Column=""TimeStart"" /></OutputList><Update WithOrderedPrefetch=""1"" DMLRequestSort=""1""><Object Database=""[AGVIP-KCC]"" Schema=""[dbo]"" Table=""[tblSchedTimeline]"" Index=""[PK__tblSched__3214EC076837DC08]"" IndexKind=""Clustered"" Storage=""RowStore"" /><RelOp NodeId=""5"" PhysicalOp=""Table Spool"" LogicalOp=""Eager Spool"" EstimateRows=""257246"" EstimateIO=""0.013125"" EstimateCPU=""0.0927087"" AvgRowSize=""11"" EstimatedTotalSubtreeCost=""30.8613"" Parallel=""0"" EstimateRebinds=""0"" EstimateRewinds=""0"" EstimatedExecutionMode=""Row""><OutputList><ColumnReference Database=""[AGVIP-KCC]"" Schema=""[dbo]"" Table=""[tblSchedTimeline]"" Column=""Id"" /></OutputList><Spool><RelOp NodeId=""6"" PhysicalOp=""Nested Loops"" LogicalOp=""Left Anti Semi Join"" EstimateRows=""257246"" EstimateIO=""0"" EstimateCPU=""4.18e-006"" AvgRowSize=""11"" EstimatedTotalSubtreeCost=""30.7555"" Parallel=""0"" EstimateRebinds=""0"" EstimateRewinds=""0"" EstimatedExecutionMode=""Row""><OutputList><ColumnReference Database=""[AGVIP-KCC]"" Schema=""[dbo]"" Table=""[tblSchedTimeline]"" Column=""Id"" /></OutputList><NestedLoops Optimized=""0""><OuterReferences><ColumnReference Database=""[AGVIP-KCC]"" Schema=""[dbo]"" Table=""[tblSchedTimeline]"" Column=""Id"" /></OuterReferences><RelOp NodeId=""7"" PhysicalOp=""Sort"" LogicalOp=""Sort"" EstimateRows=""1"" EstimateIO=""0.0112613"" EstimateCPU=""0.000100011"" AvgRowSize=""11"" EstimatedTotalSubtreeCost=""29.3753"" Parallel=""0"" EstimateRebinds=""0"" EstimateRewinds=""0"" EstimatedExecutionMode=""Row""><OutputList><ColumnReference Database=""[AGVIP-KCC]"" Schema=""[dbo]"" Table=""[tblSchedTimeline]"" Column=""Id"" /></OutputList><MemoryFractions Input=""1"" Output=""1"" /><Sort Distinct=""0""><OrderBy><OrderByColumn Ascending=""1""><ColumnReference Database=""[AGVIP-KCC]"" Schema=""[dbo]"" Table=""[tblSchedTimeline]"" Column=""Id"" /></OrderByColumn></OrderBy><RelOp NodeId=""8"" PhysicalOp=""Nested Loops"" LogicalOp=""Left Anti Semi Join"" EstimateRows=""1"" EstimateIO=""0"" EstimateCPU=""1.07529"" AvgRowSize=""11"" EstimatedTotalSubtreeCost=""29.3639"" Parallel=""0"" EstimateRebinds=""0"" EstimateRewinds=""0"" EstimatedExecutionMode=""Row""><OutputList><ColumnReference Database=""[AGVIP-KCC]"" Schema=""[dbo]"" Table=""[tblSchedTimeline]"" Column=""Id"" /></OutputList><NestedLoops Optimized=""0""><RelOp NodeId=""9"" PhysicalOp=""Index Scan"" LogicalOp=""Index Scan"" EstimateRows=""257246"" EstimatedRowsRead=""257246"" EstimateIO=""0.874977"" EstimateCPU=""0.283128"" AvgRowSize=""11"" EstimatedTotalSubtreeCost=""1.1581"" TableCardinality=""257246"" Parallel=""0"" EstimateRebinds=""0"" EstimateRewinds=""0"" EstimatedExecutionMode=""Row""><OutputList><ColumnReference Database=""[AGVIP-KCC]"" Schema=""[dbo]"" Table=""[tblSchedTimeline]"" Column=""Id"" /></OutputList><IndexScan Ordered=""1"" ScanDirection=""FORWARD"" ForcedIndex=""0"" ForceSeek=""0"" ForceScan=""0"" NoExpandHint=""0"" Storage=""RowStore""><DefinedValues><DefinedValue><ColumnReference Database=""[AGVIP-KCC]"" Schema=""[dbo]"" Table=""[tblSchedTimeline]"" Column=""Id"" /></DefinedValue></DefinedValues><Object Database=""[AGVIP-KCC]"" Schema=""[dbo]"" Table=""[tblSchedTimeline]"" Index=""[IDX_SchedTimeline_Ids]"" TableReferenceId=""1"" IndexKind=""NonClustered"" Storage=""RowStore"" /></IndexScan></RelOp><RelOp NodeId=""10"" PhysicalOp=""Row Count Spool"" LogicalOp=""Lazy Spool"" EstimateRows=""1"" EstimateIO=""0"" EstimateCPU=""0.0001001"" AvgRowSize=""9"" EstimatedTotalSubtreeCost=""27.1305"" Parallel=""0"" EstimateRebinds=""0"" EstimateRewinds=""257245"" EstimatedExecutionMode=""Row""><OutputList /><RowCountSpool><RelOp NodeId=""11"" PhysicalOp=""Filter"" LogicalOp=""Filter"" EstimateRows=""1"" EstimateIO=""0"" EstimateCPU=""0.0331891"" AvgRowSize=""9"" EstimatedTotalSubtreeCost=""1.38021"" Parallel=""0"" EstimateRebinds=""0"" EstimateRewinds=""0"" EstimatedExecutionMode=""Row""><OutputList /><Filter StartupExpression=""0""><RelOp NodeId=""12"" PhysicalOp=""Stream Aggregate"" LogicalOp=""Aggregate"" EstimateRows=""69144"" EstimateIO=""0"" EstimateCPU=""0.18892"" AvgRowSize=""11"" EstimatedTotalSubtreeCost=""1.34702"" Parallel=""0"" EstimateRebinds=""0"" EstimateRewinds=""0"" EstimatedExecutionMode=""Row""><OutputList><ColumnReference Column=""Expr1004"" /></OutputList><StreamAggregate><DefinedValues><DefinedValue><ColumnReference Column=""Expr1004"" /><ScalarOperator ScalarString=""MIN([AGVIP-KCC].[dbo].[tblSchedTimeline].[Id])""><Aggregate Distinct=""0"" AggType=""MIN""><ScalarOperator><Identifier><ColumnReference Database=""[AGVIP-KCC]"" Schema=""[dbo]"" Table=""[tblSchedTimeline]"" Column=""Id"" /></Identifier></ScalarOperator></Aggregate></ScalarOperator></DefinedValue></DefinedValues><GroupBy><ColumnReference Database=""[AGVIP-KCC]"" Schema=""[dbo]"" Table=""[tblSchedTimeline]"" Column=""IdProject"" /><ColumnReference Database=""[AGVIP-KCC]"" Schema=""[dbo]"" Table=""[tblSchedTimeline]"" Column=""IdRepresentative"" /><ColumnReference Database=""[AGVIP-KCC]"" Schema=""[dbo]"" Table=""[tblSchedTimeline]"" Column=""TimeStart"" /></GroupBy><RelOp NodeId=""13"" PhysicalOp=""Index Scan"" LogicalOp=""Index Scan"" EstimateRows=""257246"" EstimatedRowsRead=""257246"" EstimateIO=""0.874977"" EstimateCPU=""0.283128"" AvgRowSize=""27"" EstimatedTotalSubtreeCost=""1.1581"" TableCardinality=""257246"" Parallel=""0"" EstimateRebinds=""0"" EstimateRewinds=""0"" EstimatedExecutionMode=""Row""><OutputList><ColumnReference Database=""[AGVIP-KCC]"" Schema=""[dbo]"" Table=""[tblSchedTimeline]"" Column=""Id"" /><ColumnReference Database=""[AGVIP-KCC]"" Schema=""[dbo]"" Table=""[tblSchedTimeline]"" Column=""IdProject"" /><ColumnReference Database=""[AGVIP-KCC]"" Schema=""[dbo]"" Table=""[tblSchedTimeline]"" Column=""IdRepresentative"" /><ColumnReference Database=""[AGVIP-KCC]"" Schema=""[dbo]"" Table=""[tblSchedTimeline]"" Column=""TimeStart"" /></OutputList><IndexScan Ordered=""1"" ScanDirection=""FORWARD"" ForcedIndex=""0"" ForceSeek=""0"" ForceScan=""0"" NoExpandHint=""0"" Storage=""RowStore""><DefinedValues><DefinedValue><ColumnReference Database=""[AGVIP-KCC]"" Schema=""[dbo]"" Table=""[tblSchedTimeline]"" Column=""Id"" /></DefinedValue><DefinedValue><ColumnReference Database=""[AGVIP-KCC]"" Schema=""[dbo]"" Table=""[tblSchedTimeline]"" Column=""IdProject"" /></DefinedValue><DefinedValue><ColumnReference Database=""[AGVIP-KCC]"" Schema=""[dbo]"" Table=""[tblSchedTimeline]"" Column=""IdRepresentative"" /></DefinedValue><DefinedValue><ColumnReference Database=""[AGVIP-KCC]"" Schema=""[dbo]"" Table=""[tblSchedTimeline]"" Column=""TimeStart"" /></DefinedValue></DefinedValues><Object Database=""[AGVIP-KCC]"" Schema=""[dbo]"" Table=""[tblSchedTimeline]"" Index=""[IDX_SchedTimeline_Ids]"" TableReferenceId=""2"" IndexKind=""NonClustered"" Storage=""RowStore"" /></IndexScan></RelOp></StreamAggregate></RelOp><Predicate><ScalarOperator ScalarString=""[Expr1004] IS NULL""><Compare CompareOp=""IS""><ScalarOperator><Identifier><ColumnReference Column=""Expr1004"" /></Identifier></ScalarOperator><ScalarOperator><Const ConstValue=""NULL"" /></ScalarOperator></Compare></ScalarOperator></Predicate></Filter></RelOp></RowCountSpool></RelOp></NestedLoops></RelOp></Sort></RelOp><RelOp NodeId=""14"" PhysicalOp=""Filter"" LogicalOp=""Filter"" EstimateRows=""1"" EstimateIO=""0"" EstimateCPU=""0.0331891"" AvgRowSize=""9"" EstimatedTotalSubtreeCost=""1.38021"" Parallel=""0"" EstimateRebinds=""0"" EstimateRewinds=""0"" EstimatedExecutionMode=""Row""><OutputList /><Filter StartupExpression=""0""><RelOp NodeId=""15"" PhysicalOp=""Stream Aggregate"" LogicalOp=""Aggregate"" EstimateRows=""69144"" EstimateIO=""0"" EstimateCPU=""0.18892"" AvgRowSize=""11"" EstimatedTotalSubtreeCost=""1.34702"" Parallel=""0"" EstimateRebinds=""0"" EstimateRewinds=""0"" EstimatedExecutionMode=""Row""><OutputList><ColumnReference Column=""Expr1004"" /></OutputList><StreamAggregate><DefinedValues><DefinedValue><ColumnReference Column=""Expr1004"" /><ScalarOperator ScalarString=""MIN([AGVIP-KCC].[dbo].[tblSchedTimeline].[Id])""><Aggregate Distinct=""0"" AggType=""MIN""><ScalarOperator><Identifier><ColumnReference Database=""[AGVIP-KCC]"" Schema=""[dbo]"" Table=""[tblSchedTimeline]"" Column=""Id"" /></Identifier></ScalarOperator></Aggregate></ScalarOperator></DefinedValue></DefinedValues><GroupBy><ColumnReference Database=""[AGVIP-KCC]"" Schema=""[dbo]"" Table=""[tblSchedTimeline]"" Column=""IdProject"" /><ColumnReference Database=""[AGVIP-KCC]"" Schema=""[dbo]"" Table=""[tblSchedTimeline]"" Column=""IdRepresentative"" /><ColumnReference Database=""[AGVIP-KCC]"" Schema=""[dbo]"" Table=""[tblSchedTimeline]"" Column=""TimeStart"" /></GroupBy><RelOp NodeId=""16"" PhysicalOp=""Index Scan"" LogicalOp=""Index Scan"" EstimateRows=""257246"" EstimatedRowsRead=""257246"" EstimateIO=""0.874977"" EstimateCPU=""0.283128"" AvgRowSize=""27"" EstimatedTotalSubtreeCost=""1.1581"" TableCardinality=""257246"" Parallel=""0"" EstimateRebinds=""0"" EstimateRewinds=""0"" EstimatedExecutionMode=""Row""><OutputList><ColumnReference Database=""[AGVIP-KCC]"" Schema=""[dbo]"" Table=""[tblSchedTimeline]"" Column=""Id"" /><ColumnReference Database=""[AGVIP-KCC]"" Schema=""[dbo]"" Table=""[tblSchedTimeline]"" Column=""IdProject"" /><ColumnReference Database=""[AGVIP-KCC]"" Schema=""[dbo]"" Table=""[tblSchedTimeline]"" Column=""IdRepresentative"" /><ColumnReference Database=""[AGVIP-KCC]"" Schema=""[dbo]"" Table=""[tblSchedTimeline]"" Column=""TimeStart"" /></OutputList><IndexScan Ordered=""1"" ScanDirection=""FORWARD"" ForcedIndex=""0"" ForceSeek=""0"" ForceScan=""0"" NoExpandHint=""0"" Storage=""RowStore""><DefinedValues><DefinedValue><ColumnReference Database=""[AGVIP-KCC]"" Schema=""[dbo]"" Table=""[tblSchedTimeline]"" Column=""Id"" /></DefinedValue><DefinedValue><ColumnReference Database=""[AGVIP-KCC]"" Schema=""[dbo]"" Table=""[tblSchedTimeline]"" Column=""IdProject"" /></DefinedValue><DefinedValue><ColumnReference Database=""[AGVIP-KCC]"" Schema=""[dbo]"" Table=""[tblSchedTimeline]"" Column=""IdRepresentative"" /></DefinedValue><DefinedValue><ColumnReference Database=""[AGVIP-KCC]"" Schema=""[dbo]"" Table=""[tblSchedTimeline]"" Column=""TimeStart"" /></DefinedValue></DefinedValues><Object Database=""[AGVIP-KCC]"" Schema=""[dbo]"" Table=""[tblSchedTimeline]"" Index=""[IDX_SchedTimeline_Ids]"" TableReferenceId=""2"" IndexKind=""NonClustered"" Storage=""RowStore"" /></IndexScan></RelOp></StreamAggregate></RelOp><Predicate><ScalarOperator ScalarString=""[AGVIP-KCC].[dbo].[tblSchedTimeline].[Id]=[Expr1004]""><Compare CompareOp=""EQ""><ScalarOperator><Identifier><ColumnReference Database=""[AGVIP-KCC]"" Schema=""[dbo]"" Table=""[tblSchedTimeline]"" Column=""Id"" /></Identifier></ScalarOperator><ScalarOperator><Identifier><ColumnReference Column=""Expr1004"" /></Identifier></ScalarOperator></Compare></ScalarOperator></Predicate></Filter></RelOp></NestedLoops></RelOp></Spool></RelOp></Update></RelOp></Sort></RelOp></Update></RelOp></QueryPlan></StmtSimple></Statements></Batch></BatchSequence></ShowPlanXML>";"              2,705";runnable;"                  2";NULL;DESKTOP-QV3K54L;AGVIP-KCC;Microsoft SQL Server Management Studio - Abfrage;2019-02-05 15:35:50.680;2019-02-05 15:35:50.677;0;2019-02-05 15:39:37.297

Output of sp_spaceused

name    rows    reserved    data    index_size  unused
tblSchedTimeline    257246                  50280 KB    36432 KB    9720 KB 4128 KB
Was it helpful?

Solution

This part of the plan is the problem.

enter image description here

Issue

The correct behaviour if the subquery brings back any NULL is to return 0 rows from the NOT IN.

Even if ID is not nullable (and therefore MIN(ID) cannot possibly be NULL when used as a vector aggregate) the datatype of MIN(ID) is regarded as nullable (it can still return NULL when used as a scalar aggregate against an empty table for example).

So you have this extra row count spool added to the plan whose job it is to ensure (in conjunction with an anti semi join) that no rows are emitted if a NULL is returned by the subquery.

Unfortunately even though the rows eliminated by the anti semi join on this spool are likely to be 0 and all 257,246 rows will flow onto the next operator the cardinality estimation reduces the estimated number of rows going past that step to 1.

As a result it has a scan of the table on the inside of the nested loops with estimated 1 execution whereas in reality it will scan and aggregate the whole table 257,246 times.

The one-row estimate coming out of the Anti Semi join is a known bug that was fixed under trace flag 4199 quite a while ago now. See the related Q & A Workaround for Anti-Semi Join bug for some more background and links.

Solution

The bug only manifests on SQL Server 2017 for you because you have compatibility level 120 selected.

You should find you get a much better estimate for the Anti Semi Join with trace flag 4199 active, an OPTION (QUERYTRACEON 4199) hint, a OPTION (USE HINT ('ENABLE_QUERY_OPTIMIZER_HOTFIXES')) hint (directly or via plan guide) or for the database:

ALTER DATABASE SCOPED CONFIGURATION
SET QUERY_OPTIMIZER_HOTFIXES = ON;

The use hint QUERY_OPTIMIZER_COMPATIBILITY_LEVEL_140 is another option as of SQL Server 2017 CU10.

Which option you choose depends on how widely you want the optimizer hotfixes to apply. Compatibility levels are intended to be rather short-term things, so you should be planning to move to a more current setting, where this particular optimizer fix is on by default.

Repro

The following script reproduces the problem and a fix:

ALTER DATABASE CURRENT 
SET COMPATIBILITY_LEVEL = 120;
GO
ALTER DATABASE SCOPED CONFIGURATION
SET QUERY_OPTIMIZER_HOTFIXES = OFF;
GO
DROP TABLE IF EXISTS dbo.tbl;
GO
CREATE TABLE dbo.tbl
(
    Id integer PRIMARY KEY, 
    IdProject integer NOT NULL, 
    IdRepresentative integer NOT NULL, 
    TimeStart datetime NOT NULL,

    INDEX i NONCLUSTERED
    (
        TimeStart, 
        IdRepresentative, 
        IdProject
    )
);
GO
UPDATE STATISTICS dbo.tbl 
WITH 
    ROWCOUNT = 257246, 
    PAGECOUNT = 25725;
DELETE FROM [tbl]
WHERE [Id] NOT IN
(
    SELECT MIN([Id])
    FROM [tbl]
    GROUP BY [IdProject], [IdRepresentative], [TimeStart]
) 
OPTION 
(
    MAXDOP 1
);

Bug

DELETE FROM [tbl]
WHERE [Id] NOT IN
(
    SELECT MIN([Id])
    FROM [tbl]
    GROUP BY [IdProject], [IdRepresentative], [TimeStart]
) 
OPTION 
(
    MAXDOP 1,
    USE HINT ('ENABLE_QUERY_OPTIMIZER_HOTFIXES')
);

with fix

Alternative Syntax

Ideally you should also rewrite the query to not use the problematic NOT IN. A possible alternative, that is likely to be more efficient even with the above fix, would be

DELETE T
FROM   (SELECT ROW_NUMBER() OVER (PARTITION BY IdProject, IdRepresentative, TimeStart ORDER BY Id) AS RN
        FROM   tbl) T
WHERE  RN > 1 

OTHER TIPS

code currently in the production and I can't change it on the fly

Creating an index is not an option. Because and can't influence the current running system.

If you can't alter the query or the schema, and you of course have no control over the data, then the only other option is to throw hardware at the problem and I assume that is out of the question too!

the CPU is used by 25% (exactly 1 of my 4 CPUs)

To possibilities here: the query plan has the server spinning running that sub-query for every row in tbl. Other than changing the statement in a manner similar to that seen in your question or changing/checking indexes (you need something on at least [IdProject], perhaps a wider index on [IdProject], [IdRepresentative], [TimeStart] you are possibly stuck in that regard.

Perhaps check to make sure that all the indexes you expect are present and enabled. Maybe update their stats in case stale information there is why the planner is doing something unusual.

Another consideration is that it might not be your query that is taking the time and consuming CPU: there could be another long-running transaction that is holding locks that forces it to queue. Check with the undocumented sp_who2 which includes information about what sessions are blocked by others if they are. If your query has little or no CPU & IO time recorded and a value in the BlkBy column then this is what has occurred. Or better still if you are allowed to add it to the DB (or local master), use sp_whoisactive which offers more detail and diagnodtics options. If you can't use sp_whoisactive because installing it would be a schema change you don't have permission for, look at the code to see what system views/tables/ether it is using and write a query to do the same yourself without needing to install a procedure.

Without further information we can't given any more detailed advice than this. And even then if you can't touch the statement nor the structure your options are going to be limited. Add to the question query plans as already suggested, also table/index definitions, and approximate size of the table: EXEC sp_spaceused 'tbl' will give both the number of rows and the about of pages consumed (that may also get queued, if so try SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED; EXEC sp_spaceused 'tbl').

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top