Domanda

In SQL Server 2012, let's have three tables: Foos, Lookup1 and Lookup2 created with the following SQL:

CREATE TABLE Foos (
    Id int NOT NULL,
    L1 int NOT NULL,
    L2 int NOT NULL,
    Value int NOT NULL,
    CONSTRAINT PK_Foos PRIMARY KEY CLUSTERED (Id ASC)
);

CREATE TABLE Lookup1 (
    Id int NOT NULL,
    Name nvarchar(50) NOT NULL,
    CONSTRAINT PK_Lookup1 PRIMARY KEY CLUSTERED (Id ASC),
    CONSTRAINT IX_Lookup1 UNIQUE NONCLUSTERED (Name ASC)
);

CREATE TABLE Lookup2 (
    Id int NOT NULL,
    Name nvarchar(50) NOT NULL,
    CONSTRAINT PK_Lookup2 PRIMARY KEY CLUSTERED (Id ASC),
    CONSTRAINT IX_Lookup2 UNIQUE NONCLUSTERED (Name ASC)
);

CREATE NONCLUSTERED INDEX IX_Foos ON Foos (
    L1 ASC,
    L2 ASC,
    Value ASC
);

ALTER TABLE Foos WITH CHECK ADD CONSTRAINT FK_Foos_Lookup1 
    FOREIGN KEY(L2) REFERENCES Lookup1 (Id);

ALTER TABLE Foos CHECK CONSTRAINT FK_Foos_Lookup1;

ALTER TABLE Foos WITH CHECK ADD CONSTRAINT FK_Foos_Lookup2 
    FOREIGN KEY(L1) REFERENCES Lookup2 (Id);

ALTER TABLE Foos CHECK CONSTRAINT FK_Foos_Lookup2;

BAD PLAN:

The following SQL query to get Foos by the lookup tables:

select top(1) f.* from Foos f
join Lookup1 l1 on f.L1 = l1.Id
join Lookup2 l2 on f.L2 = l2.Id
where l1.Name = 'a' and l2.Name = 'b' 
order by f.Value

Bad plan

does not fully utilize the IX_Foos index, see http://sqlfiddle.com/#!6/cd5c1/1/0 and the plan with data. (It just chooses one of the lookup tables.)


GOOD PLAN:

However if I rewrite the query:

declare @l1Id int = (select Id from Lookup1 where Name = 'a');
declare @l2Id int = (select Id from Lookup2 where Name = 'b');

select top(1) f.* from Foos f
where f.L1 = @l1Id and f.L2 = @l2Id 
order by f.Value

Good plan

it works as expected. It firstly lookup both lookup tables and then uses to seek the IX_Foos index.

Is it possible to use a hint to force the SQL Server in the first query (with joins) to lookup the ids first and then use it for IX_Foos?

Because if the Foos table is quite large, the first query (with joins) locks the whole table:(

NOTE: The inner join query comes from LINQ. Or is it possible to force LINQ in Entity Framework to rewrite the queries using declare. Since doing the lookup in multiple requests could have longer roundtrip delay in more complex queries.

NOTE2: In Oracle it works ok, it seems like a problem of SQL Server.

NOTE3: The locking issue is more apparent when adding TOP(1) to the select f.* from Foos .... (For instance you need to get only the min or max value.)


UPDATE: According to the @Hoots hint, I have changed IX_Lookup1 and IX_Lookup2:

CONSTRAINT IX_Lookup1 UNIQUE NONCLUSTERED (Name ASC, Id ASC)
CONSTRAINT IX_Lookup2 UNIQUE NONCLUSTERED (Name ASC, Id ASC)

It helps, but it is still sorting all results:

enter image description here

Why is it taking all 10,000 rows from Foos that are matching f.L1 and f.L2, instead of just taking the first row. (The IX_Foos contains Value ASC so it could find the first row without processing all 10,000 rows and sort them.) The previous plan with declared variables is using the IX_Foos, so it is not doing the sort.

È stato utile?

Soluzione

Looking at the query plans, SQL Server is using the same indexes in both versions of the SQL you've put down, it's just in the second version of sql it's executing 3 seperate pieces of SQL rather than 1 and so evaluating the indexes at different times.

I have checked and I think the solution is to change the indexes as below...

CONSTRAINT IX_Lookup1 UNIQUE NONCLUSTERED (Name ASC, ID ASC)

and

CONSTRAINT IX_Lookup2 UNIQUE NONCLUSTERED (Name ASC, ID ASC)

when it evaluates the index it won't go off and need to get the ID from the table data as it will have it in the index. This changes the plan to be what you want, hopefully preventing the locking you're seeing but I'm not going to guarantee that side of it as locking isn't something I'll be able to reproduce.

UPDATE: I now see the issue...

The second piece of SQL is effectively not using set based operations. Simplifying what you've done you're doing...

select f.*
from Foos f
where f.L1 = 1
  and f.L2 = 1
order by f.Value desc

Which only has to seek on a simple index to get the results that are already ordered.

In the first bit of SQL (as shown below) you're combining different data sets that has indexes only on the individual table items. The next two bits of SQL do the same thing with the same query plan...

select f.* -- cost 0.7099
from Foos f
join Lookup1 l1 on f.L1 = l1.Id
join Lookup2 l2 on f.L2 = l2.Id
where l1.Name = 'a' and l2.Name = 'b' 
order by f.Value

select f.* -- cost 0.7099
from Foos f
inner join (SELECT l1.id l1Id, l2.id l2Id
            from Lookup1 l1, Lookup2 l2
            where l1.Name = 'a' and l2.Name='b') lookups on (f.L1 = lookups.l1Id and f.L2=lookups.l2Id)
order by f.Value desc

The reason I've put both down is because you can hint in the second version quite easily that it's not set based but singular and write it down as this...

select f.* -- cost 0.095
from Foos f
inner join (SELECT TOP 1 l1.id l1Id, l2.id l2Id
            from Lookup1 l1, Lookup2 l2
            where l1.Name = 'a' and l2.Name='b') lookups on (f.L1 = lookups.l1Id and f.L2=lookups.l2Id)
order by f.Value desc

Of course you can only do this knowing that the sub query will bring back a single record whether the top 1 is mentioned or not. This then brings down the cost from 0.7099 to 0.095. I can only summise that now that there is explicitly a single record input the optimiser now knows the order of things can be dealt with by the index rather than having to 'manually' order them.

Note: 0.7099 isn't very large for a query that runs singularly i.e. you'll hardly notice but if it's part of a larger set of executions you can get the cost down if you like. I suspect the question is more about the reason why, which I believe is down to set based operations against singular seeks.

Altri suggerimenti

Try to use CTE like this

with cte as
(select min(Value) as Value from Foos f
join Lookup1 l1 on f.L1 = l1.Id
join Lookup2 l2 on f.L2 = l2.Id
where l1.Name = 'a' and l2.Name = 'b')
select top(1) * from Foos where exists (select * from cte where cte.Value=Foos.Value)
option (recompile)

This will twice reduce logical reads from Foos table and execution time.

set statistics io,time on

1) your first query with indexes by @Hoots Estimated Subtree Cost = 0.888 Table 'Foos'. Scan count 1, logical reads 59 CPU time = 15 ms, elapsed time = 151 ms.

2) this cte query with the same indexes Estimated Subtree Cost = 0.397 Table 'Foos'. Scan count 2, logical reads 34 CPU time = 15 ms, elapsed time = 66 ms.

enter image description here

But this technique for billions of rows in Foos can be quite slow as far as we touch this table twice instead of your first query.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top