Bad estimates when sorting, good estimates without sort

https://dba.stackexchange.com/questions/269282

03-03-2021
|

Question

I'm struggling with the tuning of one of the queries on our production SQL Server 2016 SP2. I needed to anonymize the query due to security policies. The query is like this:

SELECT TOP 1000 Object1.Column1,Object1.Column2,Object1.Column3,Object1.Column4,Object1.Column5,Object1.Column6,Object1.Column7,Object1.Column8,Object1.Column9,Object1.Column10,Object1.Column11,Object1.Column12,Object1.Column13,Object1.Column14,Object1.Column15,Object1.Column16,Object1.Column17,Object1.Column18,Object1.Column19,Object1.Column20,Object1.Column21,Object1.Column22,Object1.Column23,Object1.Column24,Object1.Column25,Object1.Column26,Object1.Column27,Object1.Column28,Object1.Column29,Object1.Column30,Object1.Column31,Object1.Column32,Object1.Column33,Object1.Column34,Object1.Column35,Object1.Column36,Object1.Column37,Object1.Column38,Object1.Column39,Object1.Column40,Object1.Column41,Object1.Column42,Object1.Column43,Object1.Column44,Object1.Column45,Object1.Column46,Object1.Column47,Object1.Column48,Object1.Column49,Object1.Column50,Object1.Column51,Object1.Column52
FROM Schema1.Object2 Object1
LEFT OUTER JOIN Schema1.Object3 Object4 ON Object4.Column3 = Object1.Column3
WHERE ((Object1.Column43 IS NULL OR Object1.Column43='') AND (Object4.Column54 = 0 OR Object4.Column54 IS NULL) AND Object1.Column3 = N'SomeValue')
ORDER BY Object1.Column3 ASC,Object1.Column8 ASC,Object1.Column7 ASC,Object1.Column12 ASC,Object1.Column13 ASC,Object1.Column1 ASC

The actual query plan can be found here
As you can see an index seek is done on Object2.Index1.Object1 but the estimates are terrible. SQL Server estimated that 194907 rows would come back but only 710 rows are returned.

When I remove the ORDER BY clause the estimates are much better, SQL Server expected only 1181 rows to come back. The plan can be found here.

What I do not understand is why an ORDER BY can influence the estimates of an index seek. Both queries are run with the same value for column3.
Can somebody explain this to me?

Solution

why an ORDER BY can influence the estimates of an index seek. Can somebody explain this to me?

This is a top values query. The ORDER BY specifies which 1000 rows the query will return. Without the ORDER BY SQL Server is free to return any 1000. So once SQL Server has 1000 rows from Schema1.Object2 that qualify in the WHERE clause, it can stop. The LEFT JOIN can't reduce the number of rows.

Specifically, the no-ORDER BY plan has a row goal when it reads Object2:

    <RelOp AvgRowSize="698" EstimateCPU="0.814711" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" 
EstimateRows="1181.27" EstimateRowsWithoutRowGoal="194907" 
LogicalOp="Inner Join" NodeId="3" Parallel="false" PhysicalOp="Nested Loops" EstimatedTotalSubtreeCost="3.89162">

The ORDER BY plan does not. It will need to sort all filtered rows before figuring out which 1000 to return.

<RelOp AvgRowSize="51" EstimateCPU="0.214555" EstimateIO="1.3572" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" 
EstimateRows="194907" 
EstimatedRowsRead="194907" LogicalOp="Index Seek" NodeId="17" Parallel="false" PhysicalOp="Index Seek" EstimatedTotalSubtreeCost="1.57175" TableCardinality="42781400">

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange