Adding an additional “order by” column gives me a much worse plan
题
in other words, how can I get rid of the sort
operator on the picture below?
the picture above shows the execution plan of the following 2 selects together:
SELECT TOP 1 so.OrgType,
ch.Status,
rcs.DBSstatusDescription,
cid.ApplicationId
FROM tbl_application_crb_initialData cid
INNER JOIN tbl_season_organisationId so
ON cid.OrganisationId = so.OrgId
LEFT JOIN tbl_crbHistory ch
ON cid.ClientReference = ch.ClientReference
LEFT JOIN ref_crbStatus rcs
ON ch.Status = rcs.statusId
ORDER BY cid.DateAdded DESC, ch.DateAdded DESC
SELECT TOP 1 so.OrgType,
ch.Status,
rcs.DBSstatusDescription,
cid.ApplicationId
FROM tbl_application_crb_initialData cid
INNER JOIN tbl_season_organisationId so
ON cid.OrganisationId = so.OrgId
LEFT JOIN tbl_crbHistory ch
ON cid.ClientReference = ch.ClientReference
LEFT JOIN ref_crbStatus rcs
ON ch.Status = rcs.statusId
ORDER BY cid.DateAdded DESC--, ch.DateAdded DESC
the only difference is that on the second query, there is only one column in the order by.
would it make a difference, as I am using top 1
?
I believe all the needed info are on the indexes and table definitions that can be seen on the query plan.
if anything else would help to get rid of that sort
just let me know, tomorrow I will post all the possible info.
解决方案
Your question is missing a lot of detail but I can reproduce something similar.
Setup
CREATE TABLE T1(X INT PRIMARY KEY, Y INT INDEX IX)
CREATE TABLE T2(X INT, Y INT , PRIMARY KEY(X, Y))
INSERT INTO T2
OUTPUT INSERTED.* INTO T1
SELECT TOP 1000000 ROW_NUMBER() OVER (ORDER BY @@SPID), ROW_NUMBER() OVER (ORDER BY @@SPID)
FROM sys.all_objects o1, sys.all_objects o2;
Query 1
SELECT TOP 1 T1.Y AS T1Y, T2.Y AS T2Y
FROM T1 JOIN T2 ON T1.X = T2.X
ORDER BY T1.Y;
Query 2
SELECT TOP 1 T1.Y AS T1Y, T2.Y AS T2Y
FROM T1 JOIN T2 ON T1.X = T2.X
ORDER BY T1.Y, T2.Y
Query 3
WITH T AS
(
SELECT TOP 1 WITH TIES T1.Y AS T1Y, T2.Y AS T2Y
FROM T1 JOIN T2 ON T1.X = T2.X
ORDER BY T1.Y
)
SELECT TOP 1 *
FROM T
ORDER BY T2Y
Query 1 just picks off the TOP 1
from the index in the desired sort order and does the needed joins on the other table for that row. If the join is successful it stops there otherwise it tries the next one in index order until it finds a row that matches or runs out of rows.
Query 2 When adding the new sort column this plan is no longer valid as there could be multiple matches tied with the TOP 1
value and SQL Server decides to join the whole lot and then get the TOP 1
from that.
Query 3 This encourages SQL Server to stick with the first strategy and then just does a TOP 1 Sort
on any rows tied with the same value for the first sort key.
For my example data Query 3 works out better than Query 2 but if you have many duplicates tied for the value of the first sort key your milage may differ.
You can try this rewrite and see how it fares
WITH T
AS (SELECT TOP 1 WITH TIES so.OrgType,
ch.Status,
rcs.DBSstatusDescription,
cid.ApplicationId,
ch.DateAdded AS chDateAdded
FROM tbl_application_crb_initialData cid
INNER JOIN tbl_season_organisationId so
ON cid.OrganisationId = so.OrgId
LEFT JOIN tbl_crbHistory ch
ON cid.ClientReference = ch.ClientReference
LEFT JOIN ref_crbStatus rcs
ON ch.Status = rcs.statusId
ORDER BY cid.DateAdded DESC)
SELECT TOP 1 OrgType,
Status,
DBSstatusDescription,
ApplicationId
FROM T
ORDER BY chDateAdded DESC