Question

I am getting very, very wrong estimates for the clustered index seek step of this query (estimated rows is 8,637,530; actual is 74,723):

Declare @StartDate as date = '10/1/2014';
Declare @EndDate as date = '10/2/2014';

select Sum(Quantity)
from Daily_GC_Items
where SalesDate between @StartDate and @EndDate;

If I use string literals for the dates like this, the estimate is almost perfect (estimated rows is 75,337.4; actual is 74,723):

select Sum(Quantity)
from Daily_GC_Items
where SalesDate between '10/1/2014' and '10/2/2014'

Table definition:

[SalesDate] [date] NOT NULL,
[Store] [varchar](10) NOT NULL,
[GuestCheckNumber] [smallint] NOT NULL,
[ItemSequence] [smallint] NOT NULL,
[Quantity] [smallint] NOT NULL,
(Several more columns)

Primary Key:

[SalesDate] ASC,
[Store] ASC,
[GuestCheckNumber] ASC,
[ItemSequence] ASC

What are the possible causes for this behavior?

Was it helpful?

Solution

...SSMS is not running this using sp_execute so I don't think this is being caused by parameter sniffing. What are the possible causes for this behavior?

The optimizer cannot 'sniff' the values of local variables, so the cardinality estimate is based on a guess. If you are using the original cardinality estimator, the fixed guess for BETWEEN is 9% of the table's cardinality.

If using the new cardinality estimator, the guess is computed using exponential backoff, applied to the separate >= and <= comparisons that BETWEEN is shorthand for. The guess for >= and <= is 30% (0.3) each, so the overall guess is 0.3 * SQRT(0.3) = 0.164317 (* table cardinality).

You have to decide between caching and reusing a reasonable plan for all future variable values, or recompiling on each execution to generate a new plan for the specific values each time. The second option makes use of the Parameter Embedding Optimization, which is only applied if you use the query hint OPTION (RECOMPILE). There is an obvious trade-off to be made here, between the cost of recompiling each time (the resulting plan is never cached for reuse) and using a not-quite-optimal reused plan for each execution.

There are additional options for you to consider, including specifying typical values in an OPTIMIZE FOR hint; see my SQLPerformance.com article for details.

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top