Why does sql server prefer the nonclustered index over the clustered index?

https://dba.stackexchange.com/questions/235053

29-01-2021
|

Question

I am trying to speed up a table and as I was experimenting I ran into this (what I think is) odd occurrence. I created a clustered index and a nonclustered index that should be the same thing. However, as I have run queries against the table I have found that SQL Server always wants to use the nonclustered index instead of the matching clustered index. On top of that, when needed SQL Server will properly do an index seek on the nonclustered index, but will always perform a scan on the clustered index.

Why does SQL Server prefer the nonclustered index?

And how can I rewrite this so I still have the performance increase but only the clustered index?

I have the following table structure:

CREATE TABLE [dbo].[Variables](
    [ID] [bigint] IDENTITY(1,1) NOT NULL,
    [Header] [varchar](255) NULL,
    [FullVariables] [varchar](max) NULL
)

Clustered index:

ALTER TABLE [dbo].[Variables] ADD  CONSTRAINT [PK_Variables] PRIMARY KEY CLUSTERED 
(
    [ID] ASC
)

Nonclustered index:

CREATE UNIQUE NONCLUSTERED INDEX [NonClusteredIndex-20190307-091011] ON [dbo].[Variables]
(
    [ID] ASC
)
INCLUDE (   [Header],
    [FullVariables])

My current knowledge leads me to believe that in this case both of those indexes should contain the data laid out in the same fashion with [ID] being the key column and then [Header] and [FullVariables] as extra data contained on the index instead of being pointers. If you have some source of knowledge that you could link I am more than eager to read more.

I should specify that I don't always want a seek and I understand that a scan is better in some cases (otherwise why would be have it). The table contains about 60GB of data due to row size (several million) multiplied by the varchar(MAX) (which contains strings that are 16000+ characters long). Before inserting into the table, a scan is done to ensure no duplicates are insterted (matching on Header for elimination and on FullVariables). Then the table is joined in several views on the ID field where the seeks are desired.

Solution

If SQL Server has two indexes to choose from, both of which satisfy ("cover") the query and provide the best possible path to locating and/or sorting the rows, you should consider it to be a coin flip. It's not, though... I believe there was some research done here (maybe by me, here and here) that showed it picked the most recent one created or first one alphabetically or something that is otherwise arbitrary.

However, if the coin flip as we'll call it involves the choice between a non-clustered index and a clustered index, and again both indexes properly satisfy the query, SQL Server will always lean toward the non-clustered. Why? Because it's guaranteed to be no wider than the clustered index. The edge case where it is exactly the same width as the clustered index is not a consideration.

You should look at the costs involved with each execution plan, and confirm that the costs SQL Server estimates for the non-clustered index are <= those for the clustered index. If you can show a counter-example, where the non-clustered index is chosen even though its estimated costs are higher than the clustered, please do.

OTHER TIPS

The best way to get SQL Server to categorically stop using the non-clustered index would be to drop the index. Your question is unclear about why you want to read the clustered index instead of the non-clustered index.

SQL Server believes it will be faster to use the non-clustered index, while still returning the exact results you need. So, why do you want SQL Server to return results more slowly by scanning the clustered index?

Add the exact query to your question, and upload the plans to https://www.brentozar.com/pastetheplan

I populated your table with same index with 500,000 records.

Then ran this query,

-- For non clustered index
SELECT index_level
    ,index_type_desc
    ,alloc_unit_type_desc
    ,page_count
    ,record_count
FROM sys.dm_db_index_physical_stats(DB_ID(), OBJECT_ID('Variables'), 2, 1, 'DETAILED')

-- For  clustered index
SELECT index_level
    ,index_type_desc
    ,alloc_unit_type_desc
    ,page_count
    ,record_count
FROM sys.dm_db_index_physical_stats(DB_ID(), OBJECT_ID('Variables'), 1, 1, 'DETAILED')

I notice that Page Count in case of Clustered index was slightly more than Non Clustered index.

So may be optimizer calculate that in case Non Clustered index it has to read less page than Clustered Index.

Even if both columns was only Varchar(100) or so then also Non Clustered index will be preferred over Clustered index for same reason.

Leaf page of Clustered Index must contain the other columns.

Leaf page of Non clustered Index only contain the ID value and Clustered Index key

So Clustered Index Page Count will be more than Non Clustered index.

That why Optimizer prefer Non clustered index is preferred in such situation.

Before inserting into the table, a scan is done to ensure no duplicates are insterted (matching on Header for elimination and on FullVariables).

This line is not clear. Do you check duplicate data only on Header or check on both columns (Header and FullVariables) ?

Can you share query use here ?

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange