Why doesn't SQL Server use my index in this SELECT … WHERE?

https://dba.stackexchange.com/questions/286452

17-03-2021
|

Question

I've created a table with a nonclustered PK (this is by design), and an additional nonclustered index on the column I'm filtering with a WHERE clause ([target_user_id]):

CREATE TABLE [dbo].[MP_Notification_Audit] (
    [id]                    BIGINT             IDENTITY (1, 1) NOT NULL,
    [type]                  INT                NOT NULL,
    [source_user_id]        BIGINT             NOT NULL,
    [target_user_id]        BIGINT             NOT NULL,
    [discussion_id]         BIGINT             NULL,
    [discussion_comment_id] BIGINT             NULL,
    [discussion_media_id]   BIGINT             NULL,
    [patient_id]            BIGINT             NULL,
    [task_id]               BIGINT             NULL,
    [date_created]          DATETIMEOFFSET (7) CONSTRAINT [DF_MP_Notification_Audit_date_created] DEFAULT (sysdatetimeoffset()) NOT NULL,
    [clicked]               BIT                NULL,
    [date_clicked]          DATETIMEOFFSET (7) NULL,
    [title]                 NVARCHAR (MAX)     NULL,
    [body]                  NVARCHAR (MAX)     NULL,
    CONSTRAINT [PK_MP_Notification_Audit1] PRIMARY KEY NONCLUSTERED ([id] ASC)
);

[...]

CREATE NONCLUSTERED INDEX [IX_MP_Notification_Audit_TargetUser] ON [dbo].[MP_Notification_Audit]
(
    [target_user_id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
GO

This table has about 11,700 rows of data in, so it should be enough to trigger the use of indexes with WHERE clauses. If I SELECT just the column I'm filtering on, only the index is used and 133 matching rows are read - an index-only scan:

SELECT [target_user_id]
  FROM [TestDb].[dbo].[MP_Notification_Audit]
  WHERE [target_user_id] = 100017

However, as soon as I add an extra column to the SELECT, the index is ignored and a table scan with a predicate is done to attain the result, reading over 11,700 rows:

SELECT [target_user_id], [patient_id]
  FROM [TestDb].[dbo].[MP_Notification_Audit]
  WHERE [target_user_id] = 100017

Why is it ignoring my index in this second query? I'd have thought it would still be more efficient to use the index to get down to 133 RIDs, then query the extra row data required, than to go through every row of the table with a predicate? I know I can add columns to the index with INCLUDE with the extra fields needed in the SELECT clause to make it use the index again, but I'm interested as to why it doesn't still use the index in this case.

Solution

Given the size of your table (~11k rows), I think it would be safe to assume that SQL Server estimated that the cost of performing a seek on the non-clustered index and then potentially multiple RID lookups was more expensive than performing a table scan.

There is some evidence to support this theory within the second query plan that you pasted. I would normally expect the Query Optimiser to suggest adding a covering index for your query as you have mentioned in your post. However, it did not. This to me suggests that SQL thinks that doing so would provide little or no improvement over a full table scan.

With all that being said, I am sure that if you added more rows to the table SQL Server may change its mind and ask you to add a covering index or start performing a seek + RID lookup as you expect. If you have Query Store enabled you can always keep an eye out for queries on this table that are causing problems - if it isn't causing you a problem, I wouldn't worry about it right now.

OTHER TIPS

A bit of extra info: this is the statistics for when you use table scan vs index seek+RID Lookup

Table Scan

Table 'MP_Notification_Audit'. Scan count 1, logical reads 154, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 1 ms.

Index seek + RID Lookup

(118 rows affected)
Table 'MP_Notification_Audit'. Scan count 1, logical reads 120, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
SQL Server Execution Times:
   CPU time = 15 ms,  elapsed time = 220 ms.

As can be seen, the diference in logical reads, is ~23%, but the absolute number are low, I/O system shouldnt even notice it, just 272kB.

But the difference in CPU is obvious, 220ms spent on RID lookup & Nested Loops is High. As was said in answer, its simple. The extra IO cost was correctly estimated to be lower then the extra cost in CPU.

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange