How to define column order for non-clustered index
-
11-12-2019 - |
Question
I have written a subquery that goes like this:
select top 1 A
from my_table
where B = some_joined_value_from_outer_query
order by C desc
I want to create a non-clustered index that would improve performance on this part.
3 questions here:
- What would be the correct order for the indexed columns?
- Also, should column A be an indexed column or just included in the index?
- Can this be rewritten without a subquery (notice the
top 1
andorder by desc
) and could that improve performance?
EDIT: Here's the query (part of a data synchronisation process):
SELECT ProductId, (
SELECT Id
FROM [Order]
WHERE Number = (
SELECT TOP 1 OrderNumber
FROM OtherDatabase..ReferenceTable
WHERE ProductNumber = Product.Number
ORDER BY [Date] DESC)
) AS OrderId
FROM Product
Solution
(A)
CREATE NONCLUSTERED INDEX foo ON dbo.my_table(B, C DESC) INCLUDE (A);
(B)
Since A is not in WHERE
or ORDER BY
, it is probably sufficient to be in the list of INCLUDE
d columns, since it's just "along for the ride" and not necessary to be part of the key.
(C)
Impossible to answer without more context. Why would you only include the subquery then ask about an outer query we can't see?
EDIT with regard to the ongoing conversation with @Quassnoi, I just wanted to demonstrate quickly that the sort direction of trailing columns in a non-clustered index can make a big difference on the plans used by a particular query. Let's take the following contrived example:
CREATE TABLE dbo.foo1(A INT, B INT, C INT);
CREATE NONCLUSTERED INDEX foo1x ON dbo.foo1(B, C) INCLUDE(A);
CREATE TABLE dbo.foo2(A INT, B INT, C INT);
CREATE NONCLUSTERED INDEX foo2x ON dbo.foo2(B, C DESC) INCLUDE(A);
INSERT dbo.foo1 SELECT TOP (500000) c.[object_id], c.[object_id], -1*c.[object_id]
FROM sys.all_columns AS c CROSS JOIN sys.all_objects
ORDER BY c.[object_id];
INSERT dbo.foo2 SELECT TOP (500000) c.[object_id], c.[object_id], -1*c.[object_id]
FROM sys.all_columns AS c CROSS JOIN sys.all_objects
ORDER BY c.[object_id];
Now, let's run these two proposed queries and inspect the plans:
SELECT A
FROM (
SELECT A, ROW_NUMBER() OVER (PARTITION BY B ORDER BY C DESC) RN
FROM dbo.foo1
) q
WHERE rn = 1;
SELECT A
FROM (
SELECT A, ROW_NUMBER() OVER (PARTITION BY B ORDER BY C DESC) RN
FROM dbo.foo2
) q
WHERE rn = 1;
Here's the plan for the query against dbo.foo1 (where C is ASC):
And the sort I mentioned:
And here's the plan for the query against dbo.foo2 (where C is DESC):
Now, if you add a WHERE clause to the inner query (e.g. WHERE B = -1024577103), the plans are more similar. But then that also implies that the PARTITION BY is unnecessary and that there needs to be some matching involved to limit the outer query to that value of B also. However my point is still that while for the specific query in the question, the sort direction of each column in an index might have little effect on the plan, but this is not true for all queries that could use the same index.
OTHER TIPS
Order does matter. The column which is at first order is the most important.
The selection of index is entirely based on first column. An index is considered for use only if the first column listed in the index is used in the query. So if there is no match on first column and column is used in JOIN, ORDER BY, or WHERE clauses of the query, index is completely ignored.
You could rewrite it like this:
SELECT m.a
FROM (
SELECT m.a, ROW_NUMBER() OVER (PARTITION BY m.b ORDER BY m.c DESC) RN
FROM outer_table o
JOIN my_table m
ON m.b = o.b
) q
WHERE rn = 1
but the subquery may actually be faster (or may not).
See this entry in my blog: