SQL Server 2008 Index Optimization - clustered lookup vs nonclustered include

https://stackoverflow.com/questions/7317032

26-10-2019
|

题

This is a long, involved question about index optimization theory. This is not homework, though I was first exposed to this question in a sample exam for Microsoft's 70-432. The original question was about general query optimization, but then I found this peculiar behavior I could not explain.

First, the table:

CREATE TABLE Invoice_details (
Invoice_id int NOT NULL,
Customer_id int NOT NULL,
Invoice_date datetime DEFAULT GETDATE() NULL,
Amount_total int NULL,
Serial_num int IDENTITY (1,1) NOT NULL)

Now, a clustered index, and the two indexes for testing:

CREATE UNIQUE CLUSTERED INDEX [ix_serial] ON [dbo].[Invoice_details] ([Serial_num] ASC)
/* Below is the "original" index */
CREATE NONCLUSTERED INDEX [ix_invoice_customer] ON [dbo].[Invoice_details] 
    ([Invoice_id] ASC,[Customer_id] ASC)
/* Below is the "optimized" index (adds one included field) */
CREATE NONCLUSTERED INDEX [ix_invoice_customer_inc] ON [dbo].[Invoice_details] 
    ([Invoice_id] ASC,[Customer_id] ASC) INCLUDE ([Invoice_date])

I also added some random test data to the table - 100000 rows. Invoice_id, Customer_id, and Amount_total each received their own random values (range 1000-9999), and Invoice_date received GETDATE() plus a random number of seconds (range 1000-9999). I can provide the actual routine I used, but did not think the specifics would be relevant.

And finally, the query:

SELECT Invoice_id,Customer_id,Invoice_date FROM Invoice_details WHERE Customer_id=1234;

Obviously, the query's first step will be a nonclustered index scan. Regardless of which index is used, that first step will return the same number of index rows. With the "original" index, the next step will be a lookup via the clustered index to retrieve Invoice_date, followed by an internal JOIN between the two sets. With the "optimized" index, that field is included in the index leaf, so the planner goes straight to returning the results.

Which index results in faster execution, and why?

解决方案

It depends ... on the tipping point.

其他提示

Assuming no issues such as fragmentation then it comes down to selectivity of the query.

The 2 indexes are very similar. Because the "optimized" one includes an additional column in the leaf pages then a full scan of that index may well mean more pages need to be read compared to the original one. However if more than a handful of rows are due to be returned I would expect the benefit of not needing the lookup to very quickly outweigh this minor disadvantage.

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow