
I'm utilising Azure SQL Database 2019.

Let's say I have columns A-F and the following indexes.

Primary Key (clustered): A

Index 1: B (ASC),C (ASC)

Index 2: B (ASC)

If I wanted to run run the following query below, is Index 2 needed or could it use Index 1 and not impact on performance?

도움이 되었습니까?


The single best way to answer your question would be to run the query, measure the performance, and look at the execution plan (two steps, measure performance in one, get the execution plan in a second, capturing execution plans affects performance). See which index is used and how it's used. Then, disable that index that was used, run the query again, and see what happens. That will tell you more than any answer here is going to.

However, based on the info provided, and making assumptions on relatively even distribution of the data as well as good selectivity in the index, you're likely to see this. Index 2 is used to seek the value supplied 'X' and a key lookup is used to get the columns not included with Index 2, BCDE. Index 1 is unlikely to be used because it's wider key means more pages and a possibly deeper index. The optimizer is more likely to pick Index 2 in this scenario.

Let me emphasize though, that answer is speculation and without testing, there's no way to be certain. It's also possible that because Index 1 is more selective, a compound key, it may be more attractive to the optimizer. It's then used with a key lookup. This is less likely, but testing will tell you more than speculation will.

다른 팁

It depends™

Let's go through this step by step. We'll create a table and all relevant indexes and then populate our table with some data.

Create Table Q275204

USE [StackExchange]

/****** Object:  Table [dbo].[Q275204]    Script Date: 10.09.2020 07:54:00 ******/


CREATE TABLE [dbo].[Q275204](
    [A] [nchar](1) NULL,
    [B] [nchar](1) NULL,
    [C] [nchar](1) NULL,
    [D] [nchar](1) NULL,
    [E] [nchar](1) NULL,
    [F] [nchar](1) NULL


Now we have an empty table. Let's add the indexes.

Create Index CPIX_Q275204_A_hot2use_20200910

This will be a clustered primary key index on the table Q275204 containing the column A.

USE [StackExchange]

/****** Object:  Index [CPIX_Q275204_A_hot2use_20200910]    Script Date: 10.09.2020 07:56:44 ******/
CREATE CLUSTERED INDEX [CPIX_Q275204_A_hot2use_20200910] ON [dbo].[Q275204]
    [A] ASC

Create Index NIX_Q275204_BC_hot2use_20200910

Next index up is the non-clustered index on table Q275204 containing the columns B and C.

USE [StackExchange]

/****** Object:  Index [NIX_Q275204_BC_hot2use_20200910]    Script Date: 10.09.2020 07:58:40 ******/
CREATE NONCLUSTERED INDEX [NIX_Q275204_BC_hot2use_20200910] ON [dbo].[Q275204]
    [B] ASC,
    [C] ASC

Create Index NIX_Q275204_B_hot2use_20200910

USE [StackExchange]

/****** Object:  Index [NIX_Q275204_B_hot2use_20200910]    Script Date: 10.09.2020 08:00:09 ******/
CREATE NONCLUSTERED INDEX [NIX_Q275204_B_hot2use_20200910] ON [dbo].[Q275204]
    [B] ASC

First Query Without Data

We now have an empty database without any data. What index will be used for your query? Will an index be used?

SELECT A, B, C, D, E FROM dbo.Q275204 WHERE B = 'X'

As it turns out, the Query Engine selects the Clustered Index. This is because the Clustered Index is the actual data and because of that, the fastest way to retrieve the data is to just scan the clustered index:

Picture of Query Execution Plan using Clustered Index Scan on table Q275204

The query plan can be viewed here online (Brent Ozar's Paste The Plan)

Add Data And Query Table For 2nd Time

Let's add some data to the table to change things:

USE [StackExchange]
INSERT INTO [dbo].[Q275204]
  ([A], [B], [C], [D], [E], [F])
  ('X', 'X', 'X', 'X', 'X', 'X')
GO 10000

...and query the table again

SELECT A, B, C, D, E FROM dbo.Q275204 WHERE B = 'X'

...to get the following execution plan:

Picture of Query Execution Plan using Clustered Index Scan on table Q275204

And online (Brent Ozar's Paste The Plan)


This time the Query Engine has decided to use clustered index scan again, but is suggesting that we should create a new index on column B and to include the columns A, C, D and E.

CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [dbo].[Q275204] ([B])
INCLUDE ([A],[C],[D],[E])

The query optimizer hasn't used the existing index NIX_Q275204_BC_hot2use_20200910 (columns B and C) nor has it used the index NIX_Q275204_B_hot2use_20200910 (column B).


At this point you could be misled to believe that deleting all the indexes would be the best option. I mean, hey, they aren't being used and only taking up space.

That assumption could be wrong. The indexes aren't being used yet, because the distribution of the current data (all data are X) doesn't require the query optimizer to consider using other indexes other than the clustered index (with the scan option). The query optimizer knows (because of the statistics of the index) that the data in the table are all X and so scanning the clusterd index to retrieve all the data is still the fastest option available.


Yes. When you create an index the database engine will create statistics based on the distribution of the data in the table/index. The statistics assist the query optimizer in deciding which indexes to use when it is building the query execution plan of a statement.

Here are the indexed and statistics of our current table:

Picture of Table Columns, Indexes and Statistics

The data or histogram of the distribution of the data in an index can look like this:

Statistics for INDEX 'NIX_Q275204_B_hot2use_20200910'.

Name                            Updated                         Rows                            Rows Sampled                    Steps                           Density                         Average Key Length              String Index                    
NIX_Q275204_B_hot2use_20200910  Sep 10 2020  8:59AM             10000                           10000                           1                               0                               4                               YES                                                             10000                           

All Density                     Average Length                  Columns                         
1                               2                               B                               
1                               4                               B, A                            

Histogram Steps                 
RANGE_HI_KEY                    RANGE_ROWS                      EQ_ROWS                         DISTINCT_RANGE_ROWS             AVG_RANGE_ROWS                  
X                               0                               10000                           0                               1                               

The statistics provide information regarding the distribution of the data in the index (and the table). Based on this information the query optimizer decides on which (existing) plan it will create (use).

Add More Data And Query Table For 3rd Time

We'll add another 20'000 records of Ys and Zs and see what happens:

USE [StackExchange]

INSERT INTO [dbo].[Q275204]
  ([A], [B], [C], [D], [E], [F])
  ('Y', 'Y', 'Y', 'Y', 'Y', 'Y'),
  ('Z', 'Z', 'Z', 'Z', 'Z', 'Z')

GO 10000

The query is the same so let's go straight to the query plan.

enter image description here

And online (Paste The Plan)

(No) Differences

The query plan is the same and the missing index is still being suggested.


Let's go and have a look what Microsoft has to say about statistics and especially about density. (Have a look at the statistics for non-clustered index after the table contains 30'000 records)

Density is information about the number of duplicates in a given column or combination of columns and it is calculated as 1/(number of distinct values). The query optimizer uses densities to enhance cardinality estimates for queries that return multiple columns from the same table or indexed view. As density decreases, selectivity of a value increases. For example, in a table representing cars, many cars have the same manufacturer, but each car has a unique vehicle identification number (VIN). An index on the VIN is more selective than an index on the manufacturer, because VIN has lower density than manufacturer.

So as the density of an column goes down the selectivity increases and the probability that the index will be used in a SELECT statement is higher.

Let's go and add more data and see if the index will be used at some point...

Add Lots More Data And Query Table For 4th Time

We'll add another 230'000 records of all the letters we haven't yet used and see what happens:

USE [StackExchange]

INSERT INTO [dbo].[Q275204]
  ([A], [B], [C], [D], [E], [F])
  ('A', 'A', 'A', 'A', 'A', 'A'),
  ('B', 'B', 'B', 'B', 'B', 'B'),
  ('C', 'C', 'C', 'C', 'C', 'C'),
  ('D', 'D', 'D', 'D', 'D', 'D'),
  ('E', 'E', 'E', 'E', 'E', 'E'),
  ('F', 'F', 'F', 'F', 'F', 'F'),
  ('G', 'G', 'G', 'G', 'G', 'G'),
  ('H', 'H', 'H', 'H', 'H', 'H'),
  ('I', 'I', 'I', 'I', 'I', 'I'),
  ('J', 'J', 'J', 'J', 'J', 'J'),
  ('K', 'K', 'K', 'K', 'K', 'K'),
  ('L', 'L', 'L', 'L', 'L', 'L'),
  ('M', 'M', 'M', 'M', 'M', 'M'),
  ('N', 'N', 'N', 'N', 'N', 'N'),
  ('O', 'O', 'O', 'O', 'O', 'O'),
  ('P', 'P', 'P', 'P', 'P', 'P'),
  ('Q', 'Q', 'Q', 'Q', 'Q', 'Q'),
  ('R', 'R', 'R', 'R', 'R', 'R'),
  ('S', 'S', 'S', 'S', 'S', 'S'),
  ('T', 'T', 'T', 'T', 'T', 'T'),
  ('U', 'U', 'U', 'U', 'U', 'U'),
  ('V', 'V', 'V', 'V', 'V', 'V'),
  ('W', 'W', 'W', 'W', 'W', 'W')

GO 10000

That's looking good.

Hold The Horses!

Before we go along and select the data, let's have a look at the statistics for the index that was created for column B.

Statistics for INDEX 'NIX_Q275204_B_hot2use_20200910'.

Name                            Updated                         Rows                            Rows Sampled                    Steps                           Density                         Average Key Length              String Index                    
NIX_Q275204_B_hot2use_20200910  Sep 10 2020  8:59AM             10000                           10000                           1                               0                               4                               YES                                                             10000                           

All Density                     Average Length                  Columns                         
1                               2                               B                               
1                               4                               B, A                            

Histogram Steps                 
RANGE_HI_KEY                    RANGE_ROWS                      EQ_ROWS                         DISTINCT_RANGE_ROWS             AVG_RANGE_ROWS                  
X                               0                               10000                           0                               1                               

No change!

Statistics aren't updated automatically, because it can be an expensive operation. The database engine will update the statistics when required or when manually triggered by a maintenance job.

Query Table For 4th Time

We'll run the query for the 4th time and see what happens.

Picture of Query Execution Plan Using Clusterd Index

And Paste The Plan


The index on column B still isn't being used, but the missing index is now shouting that the benefits would be immense:

Missing Index (Impact 98.0765): CREATE NONCLUSTERED INDEX ....

Let's take a short break and go back to your question.

Your Question / My (Current) Answer

If I wanted to run run the following query below, is Index 2 needed or could it use Index 1 and not impact on performance?

Up to know you could drop both indexes and you wouldn't notice much difference. But as the density in the index/statistic sinks, the selectivity increases up to a point where your index might be required.

It depends....

Add More Data And Query Table For 5th Time

We'll add another 230'000 records containing all the other letters, except X, Y and Z and see how the query responds.

Lo and behold: no change.

Picture of Query Execution Plan using Clustered Index

The Missing Index is still shouting and we are still triggering the use of the Clustered Index.

Add More Data And Query Table For 5th Time

Let's add a 0 to the GO statement of our INSERT statement and add 2.3 Mio records to the table and then select away....

2.3 Mio records added and the query is still using the Clustered Index. The Missing Index impact has increased again, but we still aren't using any of the non-clustered indexes.

Add More Data And Query Table For 6th Time

Let's add a further 0 to the INSERT and add 23 Mio records to our already existing 3.04 Mio records. Then we'll SELECT again and see if the query optimizer starts using the NIX_Q275204_B_hot2use_20200910 index on column B.

Sorry, that took a while [18:00 Min] to insert the 23 Mio records

Picture of Query Execution Plan Using Non-Clustered Index NIX_Q275204_BC_hot2use_20200910

And the Paste The Plan to examine online.

Musings / Findings

Well what a surprise. Your query has used the index that is covering the B and C columns of your table. Nearly like what the MISSING INDEX was shouting about.

Your Question / My (Current) Answer

If I wanted to run run the following query below, is Index 2 needed or could it use Index 1 and not impact on performance?

Well, with enough data, your index 1 (covering columns B and C) will eventually be used. Index 2 is not (yet) required.

But It depends...

Parameterized Queries

If you change the query to use parameters, then the index on columns B and C is no longer used.

SET @x = 'X'
SELECT A, B, D, E FROM dbo.Q275204 WHERE B = @x

The query engine goes back to scanning the Clusterd Index.

Missing Index

If you create the missing index as suggested earlier on, then the query optimizer uses that index.

CREATE NONCLUSTERED INDEX NIX_Q275204_B_INCL_ACDE_hot2use_20200910 ON [dbo].[Q275204] ([B]) INCLUDE ([A],[C],[D],[E])

Run the query and:

Picture of Execution Plan using Missing Index

See also Paste The Plan


Optimizing Indexes is a very time-consuming activity. Determining whether an index should be used/dropped/created equally so.

You have to observe your environment, your application and your database, to determine if and when an index could be used. Don't create every index that is suggested, but chose wisely. Sometimes they can help, other times they are just additional data in a database.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 dba.stackexchange
scroll top