Question

A custom statistic exists for the CacheId column of a table. After an overnight statistics update:

Statistics for INDEX 'ST_TableName_CacheId'.
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Name                            Updated                         Rows                            Rows Sampled                    Steps                           Density                         Average Key Length              String Index                    
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
ST_TableName_CacheId Apr 26 2014  2:04AM             121482                          121482                          6                               0                               4                               NO                                                              121482                          

All Density                     Average Length                  Columns                         
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
0.1666667                       4                               CacheId                         

Histogram Steps                 
RANGE_HI_KEY                    RANGE_ROWS                      EQ_ROWS                         DISTINCT_RANGE_ROWS             AVG_RANGE_ROWS                  
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
39968                           0                               20247                           0                               1                               
40058                           0                               20247                           0                               1                               
40062                           0                               20247                           0                               1                               
40066                           0                               20247                           0                               1                               
40069                           0                               20247                           0                               1                               
41033                           0                               20247                           0                               1                               

1) Performance of a join against an existing data set in this table where CacheId = 41033 perform well with good estimates (23622 vs actual of 20247).

2) Then an insert is performed with CacheId = 41273 of 20247 rows.

3) Then a join against this newly inserted data set shows a poor estimate of 1 row resulting in a bad plan.

4) A manual update of the statistics (which was originally with fullscan) shows a new histogram:

Statistics for INDEX 'ST_TableName_CacheId'.
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Name                            Updated                         Rows                            Rows Sampled                    Steps                           Density                         Average Key Length              String Index                    
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
ST_TableName_CacheId Apr 28 2014 10:41AM             141729                          141729                          7                               0                               4                               NO                                                              141729                          

All Density                     Average Length                  Columns                         
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
0.1428571                       4                               CacheId                         

Histogram Steps                 
RANGE_HI_KEY                    RANGE_ROWS                      EQ_ROWS                         DISTINCT_RANGE_ROWS             AVG_RANGE_ROWS                  
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
39968                           0                               20247                           0                               1                               
40058                           0                               20247                           0                               1                               
40062                           0                               20247                           0                               1                               
40066                           0                               20247                           0                               1                               
40069                           0                               20247                           0                               1                               
41033                           0                               20247                           0                               1                               
41274                           0                               20247                           0                               1                               

5) Running the same join query again for the CacheId = 41274 shows perfect estimates (20247) and good performance.

Q1) Why mathematically is the original estimate so bad? I mean the CacheId's are sparse but not at a ratio of 20000:1.

Q2) As the number of cacheId's increases would you expect the estimates for newly inserted data improve naturally?

Q3) Are there any ways (gulp, tricks or otherwise) to improve the estimate (or make it less certain of 1 row) without having to update the statistics every time a new set of data is inserted (e.g. adding a fake data set at a much larger CacheId = 999999).

Here are the true number of rows for all CacheId's within the table:

CacheId Rows
39968   20247
40058   20247
40062   20247
40066   20247
40069   20247
41033   20247
41274   20247

[ I don't think QP's are needed to answer this question and its some work to clean them up. I can answer specific questions if needed! ]

No correct solution

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top