Redundant fields in ORDER BY for efficient sorting?

https://dba.stackexchange.com/questions/234460

27-01-2021
|

Domanda

Suppose I have a table SortTest with fields Data1, Data2, Sort1, and Sort2, Key1 and Key2 need to perform the following query:

SELECT TOP 1
    Data1,
    Data2
FROM SortTest
WHERE Key1 = @key1 AND
      Key2 = @key2
ORDER BY Sort1, Sort2

In order to opimise it, I created an index for the following sequence

Key1, Key2, Sort1, Sort2

But the execution plan still shows an index scan instead of a seek, for it cannot sort effectively on a field sequence that does not start an index. In order, therefore, to optimise the query, I had to add they keys to the ORDER BY clause, which are, of course, redundant:

SELECT TOP 1
    Data1,
    Data2
FROM SortTest
WHERE Key1 = @key1 AND
      Key2 = @key2
ORDER BY Key1, Key2, Sort1, Sort2

The query now works as expected, but I should like to know whether it can be optimised in a more elegant way.

Correction:

I realized that when I simplified the queries above, there was a grave mistake. One of the condition had an IN, not =, so the real queries (that produce the different plans) have this instead:

WHERE Key1 = @key1 AND
      Key2 = IN (@key2a, @key2b, ...)

which explains (that the key2 values in the result are not fixed and thus the different ORDER BY can result in different output) and the plan difference.

Let thank every body for their help and apologies for the confusion.

Soluzione

This is actually pretty straightforward. When performing a query, SQL Server first looks to identify the rows to be returned. If there are WHERE clause elements, those are checked before the system even considers using an index used for the ORDER BY.

This makes perfect sense if you consider the possibilities.

No index on anything in the WHERE clause -- must perform table or clustered index scan
Index on one element in the WHERE clause -- Scan all rows as filtered by the one element for other matches
Index on all elements in the WHERE clause -- Select all rows based on the index. Must look up data in record.
Index on all elements in the WHERE clause, and INCLUDE of the data elements -- Select all rows based on the index. Use the data elements in the INCLUDE statement.
Index on all elements in the WHERE clause plus all data elements -- Select all rows based on the index, and use the data elements embedded in the index.
Index on the data elements only -- If there is a clustered index, looking up data on that index will be most efficient. Unknown if the query optimizer would scan the data element index to try to make the data retrieval more efficiend, but it is doubtful.

Basically, your best index would be:

CREATE INDEX MyIndex ON MyTable (Key1, Key2, Sort1, Sort2)

The data would be searched on the Key information (which is what an index is designed for), then use the additional information, which is already sorted in the index, for the output.

Edited

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a dba.stackexchange