Speed Up Cross Apply Without Index Hint

https://dba.stackexchange.com/questions/208822

02-01-2021
|

Question

I have a very small table with 12 rows in it that can be created with the following statement:

CREATE TABLE dbo.SmallTable(ScoreMonth tinyint NOT NULL PRIMARY KEY,
                            ScoreGoal float NOT NULL
                           );

I have another table with ≈100M rows in it that can be created with the following statments:

CREATE TABLE dbo.SlowCrossApply(RecordKey nvarchar(12) NOT NULL,
                                Score1 decimal(3, 2) NOT NULL,
                                Score2 decimal(3, 2) NOT NULL,
                                Score3 decimal(3, 2) NOT NULL,
                                Score4 decimal(3, 2) NOT NULL,
                                Score5 decimal(3, 2) NOT NULL,
                                Score6 decimal(3, 2) NOT NULL,
                                FromToday bit NOT NULL
                               );

ALTER TABLE dbo.SlowCrossApply ADD CONSTRAINT i01PK PRIMARY KEY CLUSTERED(RecordKey ASC)
    WITH(FILLFACTOR = 90, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON,
         DATA_COMPRESSION = PAGE
        );

CREATE NONCLUSTERED INDEX i02TodayRecords ON dbo.SlowCrossApply(FromToday)
    INCLUDE (Score1, Score2, Score3, Score4, Score5, Score6)
    WHERE FromToday = 1
    WITH(FILLFACTOR = 100, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON,
         DATA_COMPRESSION = PAGE
        );

i02TodayRecords has ≈1M rows in it. When I run the following query—I struggled formatting it to both look clean and prevent a horizontal scrollbar—it takes over 5 minutes to finish:

SELECT b.RecordKey,
       COALESCE(NULLIF(ROUND(((0.95 * (ROW_NUMBER() OVER(PARTITION BY a.Prefix
                                                         ORDER BY b.Score6 ASC
                                                        ) - 1
                                      )
                              )
                              / COALESCE(NULLIF(COUNT(*) OVER(PARTITION BY a.Prefix) - 1, 0
                                               ), 1
                                        )
                             ) + 0.005, 2
                            ), 0.96
                      ), 0.95
               ) AS NewScore
FROM (SELECT LEFT(s.RecordKey, 2) AS Prefix,
             CAST(ROUND(sm.ScoreGoal * COUNT(*), 0) AS int) AS Quant
      FROM dbo.SlowCrossApply AS s
      CROSS JOIN dbo.SmallTable AS sm
      WHERE s.FromToday = 1 AND sm.ScoreMonth = MONTH(GETDATE())
      GROUP BY LEFT(s.RecordKey, 2), sm.ScoreGoal
     ) AS a
CROSS APPLY (SELECT TOP(a.Quant) s2.RecordKey, s2.Score6
             FROM dbo.SlowCrossApply AS s2
             WHERE s2.FromToday = 1 AND s2.Score6 > 0 AND LEFT(s2.RecordKey, 2) = a.Prefix
             ORDER BY s2.Score6 DESC
            ) AS b;

The outer subquery returns only 10 rows; and if I supply a hint to use i02TodayRecords or put the results of the outer subquery in a table variable, it takes less than 1 second. The final result returns just over 8000 rows.

The execution plan shows that 64% of the cost is due to an eager index spool on the clustered index in the Cross Apply portion.

I know the index hint works (at least for now), but I'm hoping to avoid using one. Ideally, I wouldn't go the table variable route either. Is there something I can do to get the query optimizer to "know" to utilize i02TodayRecords? I realize there is a lot more information that is probably important, and I'll do my best to supply said information if requested.

Some potentially useful information: the indexes have less than 1% fragmentation. The statistics for both indexes have been updated via a FULLSCAN, and the database is set to have simple parameterization and parameter sniffing—unfortunately, I can't changes those settings. In regards to the latter, the query optimizer did not replace any values with parameters unlike other simple queries I have run where I was forced to use a hint to utilize a particular filtered index.

Solution

The problem you're likely facing is around SARGability, namely using the LEFT function in your WHERE clause:

LEFT(s2.RecordKey, 2) = a.Prefix

With that in there, you're stuck running the function for every row and then comparing it. You can't index for that, as-is. Putting the transformation into a CTE, view, or derived table wouldn't help, nor would writing a function to perform the manipulation.

One way around that is to create and index a computed column:

ALTER TABLE dbo.SlowCrossApply ADD LeftTwoPrefix AS LEFT(RecordKey, 2);

Which can be indexed. I'm also changing your index definition a bit:

CREATE NONCLUSTERED INDEX TodayRecords_Filtered ON 
dbo.SlowCrossApply(LeftTwoPrefix, FromToday, Score6)
INCLUDE (Score1, Score2, Score3, Score4, Score5)
WHERE FromToday = 1

Another alternative would be to dump the results of your CROSS JOIN into a temp table:

SELECT     LEFT(s.RecordKey, 2) AS Prefix, CAST(ROUND(sm.ScoreGoal * COUNT(*), 0) AS INT) AS Quant
INTO       #yourmom
FROM       dbo.SlowCrossApply AS s
CROSS JOIN dbo.SmallTable AS sm
WHERE      s.FromToday = 1
AND        sm.ScoreMonth = MONTH(GETDATE())
GROUP BY   LEFT(s.RecordKey, 2), sm.ScoreGoal;

You followed up with:

... do you agree with the FromToday [column] being the index key in i02TodayRecords instead of RecordKey, ignoring the specifics of the query I used here?

In this case, it doesn't matter much. Non-unique nonclustered indexes store clustered index key columns in all levels of the nonclustered index. See my post here: Where Clustered Index Keys Dare.

And:

Furthermore, since RecordKey is "random", there would likely be a great deal of fragmentation that would occur if it were the index key as opposed to FromToday which will always be 1. Do those points sound valid?

I generally don't worry about index fragmentation, so no. I aim to create indexes to help queries. A fragmented index is much more helpful than a non-existent index, and in many circumstances you'll never notice the fragmentation.

Hope this helps!

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange