Question

I have written a subquery that goes like this:

select top 1 A
from my_table
where B = some_joined_value_from_outer_query
order by C desc

I want to create a non-clustered index that would improve performance on this part.

3 questions here:

  1. What would be the correct order for the indexed columns?
  2. Also, should column A be an indexed column or just included in the index?
  3. Can this be rewritten without a subquery (notice the top 1 and order by desc) and could that improve performance?

EDIT: Here's the query (part of a data synchronisation process):

SELECT ProductId, (
    SELECT Id 
    FROM [Order]
    WHERE Number = (
        SELECT TOP 1 OrderNumber
        FROM OtherDatabase..ReferenceTable
        WHERE ProductNumber = Product.Number
        ORDER BY [Date] DESC)
    ) AS OrderId
FROM Product
Was it helpful?

Solution

(A)

CREATE NONCLUSTERED INDEX foo ON dbo.my_table(B, C DESC) INCLUDE (A);

(B)

Since A is not in WHERE or ORDER BY, it is probably sufficient to be in the list of INCLUDEd columns, since it's just "along for the ride" and not necessary to be part of the key.

(C)

Impossible to answer without more context. Why would you only include the subquery then ask about an outer query we can't see?

EDIT with regard to the ongoing conversation with @Quassnoi, I just wanted to demonstrate quickly that the sort direction of trailing columns in a non-clustered index can make a big difference on the plans used by a particular query. Let's take the following contrived example:

CREATE TABLE dbo.foo1(A INT, B INT, C INT);
CREATE NONCLUSTERED INDEX foo1x ON dbo.foo1(B, C) INCLUDE(A);

CREATE TABLE dbo.foo2(A INT, B INT, C INT);
CREATE NONCLUSTERED INDEX foo2x ON dbo.foo2(B, C DESC) INCLUDE(A);

INSERT dbo.foo1 SELECT TOP (500000) c.[object_id], c.[object_id], -1*c.[object_id]
FROM sys.all_columns AS c CROSS JOIN sys.all_objects
ORDER BY c.[object_id];

INSERT dbo.foo2 SELECT TOP (500000) c.[object_id], c.[object_id], -1*c.[object_id]
FROM sys.all_columns AS c CROSS JOIN sys.all_objects
ORDER BY c.[object_id];

Now, let's run these two proposed queries and inspect the plans:

SELECT  A
FROM    (
        SELECT  A, ROW_NUMBER() OVER (PARTITION BY B ORDER BY C DESC) RN
        FROM    dbo.foo1
        ) q
WHERE   rn = 1;

SELECT  A
FROM    (
        SELECT  A, ROW_NUMBER() OVER (PARTITION BY B ORDER BY C DESC) RN
        FROM    dbo.foo2
        ) q
WHERE   rn = 1;

Here's the plan for the query against dbo.foo1 (where C is ASC):

enter image description here

enter image description here

And the sort I mentioned:

enter image description here


And here's the plan for the query against dbo.foo2 (where C is DESC):

enter image description here

enter image description here


Now, if you add a WHERE clause to the inner query (e.g. WHERE B = -1024577103), the plans are more similar. But then that also implies that the PARTITION BY is unnecessary and that there needs to be some matching involved to limit the outer query to that value of B also. However my point is still that while for the specific query in the question, the sort direction of each column in an index might have little effect on the plan, but this is not true for all queries that could use the same index.

OTHER TIPS

Order does matter. The column which is at first order is the most important.

The selection of index is entirely based on first column. An index is considered for use only if the first column listed in the index is used in the query. So if there is no match on first column and column is used in JOIN, ORDER BY, or WHERE clauses of the query, index is completely ignored.

You could rewrite it like this:

SELECT  m.a
FROM    (
        SELECT  m.a, ROW_NUMBER() OVER (PARTITION BY m.b ORDER BY m.c DESC) RN
        FROM    outer_table o
        JOIN    my_table m
        ON      m.b = o.b
        ) q
WHERE   rn = 1

but the subquery may actually be faster (or may not).

See this entry in my blog:

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top