Why does SQL Server do lots of unnecessary clustered key lookups in this SELECT - OFFSET - FETCH?

https://dba.stackexchange.com/questions/286492

17-03-2021
|

Question

I have the following table:

CREATE TABLE [dbo].[MP_Notification_Audit](
    [id] [bigint] IDENTITY(1,1) NOT NULL,
    [type] [int] NOT NULL,
    [source_user_id] [bigint] NOT NULL,
    [target_user_id] [bigint] NOT NULL,
    [discussion_id] [bigint] NULL,
    [discussion_comment_id] [bigint] NULL,
    [discussion_media_id] [bigint] NULL,
    [patient_id] [bigint] NULL,
    [task_id] [bigint] NULL,
    [date_created] [datetimeoffset](7) NOT NULL,
    [clicked] [bit] NULL,
    [date_clicked] [datetimeoffset](7) NULL,
    [title] [nvarchar](max) NULL,
    [body] [nvarchar](max) NULL,
 CONSTRAINT [PK_MP_Notification_Audit] PRIMARY KEY CLUSTERED 
(
    [id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO

ALTER TABLE [dbo].[MP_Notification_Audit] ADD  CONSTRAINT [DF_MP_Notification_Audit_date_created]  DEFAULT (sysdatetimeoffset()) FOR [date_created]
GO

CREATE NONCLUSTERED INDEX [IX_MP_Notification_Audit_TargetUserDateCreated] ON [dbo].[MP_Notification_Audit]
(
    [target_user_id] ASC,
    [date_created] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
GO

There are over 10000 rows in the table with a [target_user_id] of 100017.

When I execute the following query:

SELECT
    [target_user_id], [patient_id]
FROM
    [dbo].[MP_Notification_Audit]
WHERE
    [target_user_id] = 100017
ORDER BY
    [date_created] ASC
OFFSET 9200 ROWS
FETCH NEXT 10 ROWS ONLY

... I get the following actual execution plan:

Why did SQL Server need to do 9210 instead of 10 clustered key lookups? The index [IX_MP_Notification_Audit_TargetUserDateCreated] should've allowed it to figure out the 10 RIDs it needed to retrieve to get [patient_id], and only do 10 clustered key lookups, right?

I've also discovered some more odd behaviour - it looks like SQL Server 'punishes' you for not selecting unindexable columns. If I instead OFFSET 10000 rows, I get the following execution plan:

SELECT
    [target_user_id], [patient_id]
FROM
    [dbo].[MP_Notification_Audit]
WHERE
    [target_user_id] = 100017
ORDER BY
    [date_created] ASC
OFFSET 10000 ROWS
FETCH NEXT 10 ROWS ONLY

... with the recommendation of creating an index that includes [patient_id], and an inefficient clustered index scan for the whole table. The time taken was 0.126s, however this could apparently have been a lot better because when I add the unindexable column [title] to the query, I get this:

SELECT
    [target_user_id], [patient_id], [title]
FROM
    [dbo].[MP_Notification_Audit]
WHERE
    [target_user_id] = 100017
ORDER BY
    [date_created] ASC
OFFSET 10000 ROWS
FETCH NEXT 10 ROWS ONLY

... and the nonclustered index is still used, the time taken only being 0.032s. Does SQL Server basically say "you could've created an index to do this more efficiently, so we're not even going to use the index you do have and we're going to do the lookup inefficiently to punish you", or am I missing something?

Solution

I agree the optimizer could be smarter about where the key lookup should occur in the plan with OFFSET and FETCH.

As a workaround, you could use a CTE like below.

WITH Top10Keys AS (
    SELECT
        ID, date_created
    FROM
        [dbo].[MP_Notification_Audit]
    WHERE
        [target_user_id] = 100017
    ORDER BY
        [date_created] ASC
    OFFSET 9200 ROWS
    FETCH NEXT 10 ROWS ONLY
)
SELECT
    [target_user_id], [patient_id]
FROM Top10Keys
JOIN [dbo].[MP_Notification_Audit] AS AdditionalData ON AdditionalData.id = Top10Keys.id
ORDER BY Top10Keys.date_created;

The nested loop key lookup is then done after the TOP operator in the plan.

OTHER TIPS

It's because the order of operations that the clauses of your query are executed in. The WHERE clause occurs before your OFFSET and FETCH clauses. That's why not only are there about 10,000 (9,000 in your first example) key lookups, but there's just as many index seeks. This is the filtering that occurs as a result of your predicate on target_user_id in your WHERE clause. Execution Plans read from right to left for order of events, so if you follow the plan starting from the index seeks and work your way left you'll see the Top operator which is what represents your OFFSET / FETCH clauses, and thus comes after your WHERE clause.

So simply put, SQL Server first needs to locate the rows based on the filters you apply via your predicates (in the WHERE, HAVING, and JOIN clauses). Then once it finds the correct rows it can apply your OFFSET / FETCH clauses. If it didn't get all the rows based on your filters first, it wouldn't know what rows it'll need to OFFSET / FETCH.

Could have the SQL Server Engine been programmed a little bit logically better in such a case? Possibly. They could've designed the Engine in such a way that realizes a Top operation is going to eventually occur on that index seeked data and apply it before the key lookups occur to at least minimize the amount of key lookups but I would imagine it would only help performance in certain cases, and make things programmatically a lot more complicated that it may not be worth it. For example, the key lookups are happening in parallel to the index seek operation (hence them being equally to the right in the plan steps). If the Top operation were to occur before the key lookups to reduce the data, then the remaining rows key lookups would have to happen serially after that Top operation, resultantly serially after the index seek operation as well, which would be worse performance-wise in some cases.

In regards to your second question on what you've experienced with the execution plan changing from index seeks to clustered index scans back to index seeks again, this is known as the Tipping Point. Basically the Query Optimizer analyzes your query and quickly calculates the cost of each operation among a series of different execution plans it can use to fetch your data, partly based on cached statistics of your data that SQL Server maintains. The sum total of those costs allows the Query Optimizer to choose the execution plan with the lowest cost (ideally the fastest execution plan).

Those cached statistics are based on how many rows exist for each value in each column out of the total rows in the table and are used to do cardinality estimates on each operation that needs to occur. In short, cardinality estimation is the number of rows SQL Server thinks a particular operation is going to return. For example, your WHERE clause in your first example returns approximately 9,000 rows and the SQL Server Engine probably estimated the cardinality to be close to that number, which resulted in the cost of an index seek operation being sufficient enough that an index seek was chosen.

When you explicitly told SQL Server to return 10,000 rows instead, it likely tripped the Tipping Point, which made it think the amount of rows it needs to return will be more efficient with an index scan operation instead, because the cardinality of rows is high enough that the SQL Engine thinks a scan will likely encounter more contiguous rows efficiently as opposed to doing 10,000 index seeks. (Long story short, the Optimizer isn't perfect, especially when dancing so close to the tipping point, and an index seek is probably still more efficient here.)

As to why when you added another unindexed column to your SELECT list causes the Query Optimizer to choose a plan not over the tipping point, I couldn't precisely tell you for sure (especially without the actual execution plan in front of me) because it's pretty complicated what's going on under the hood, but generally speaking the estimations it did, based on the cached statistics of your data at that point in time, resulted in it calculating costs that were not over the tipping point any more, and resulted in it choosing an execution plan that leverages index seeks instead.

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange