SQL Server query plan differences

https://stackoverflow.com/questions/1667163

13-09-2019
|

Question

I'm having trouble understanding the behavior of the estimated query plans for my statement in SQL Server when a change from a parameterized query to a non-parameterized query.

I have the following query:

DECLARE @p0 UniqueIdentifier = '1fc66e37-6eaf-4032-b374-e7b60fbd25ea'
SELECT [t5].[value2] AS [Date], [t5].[value] AS [New]
FROM (
    SELECT COUNT(*) AS [value], [t4].[value] AS [value2]
    FROM (
        SELECT CONVERT(DATE, [t3].[ServerTime]) AS [value]
        FROM (
            SELECT [t0].[CookieID]
            FROM [dbo].[Usage] AS [t0]
            WHERE ([t0].[CookieID] IS NOT NULL) AND ([t0].[ProductID] = @p0)
            GROUP BY [t0].[CookieID]
            ) AS [t1]
        OUTER APPLY (
            SELECT TOP (1) [t2].[ServerTime]
            FROM [dbo].[Usage] AS [t2]
            WHERE ((([t1].[CookieID] IS NULL) AND ([t2].[CookieID] IS NULL)) 
            OR (([t1].[CookieID] IS NOT NULL) AND ([t2].[CookieID] IS NOT NULL) 
            AND ([t1].[CookieID] = [t2].[CookieID]))) 
            AND ([t2].[CookieID] IS NOT NULL)          
            AND ([t2].[ProductID] = @p0)
            ORDER BY [t2].[ServerTime]
            ) AS [t3]
        ) AS [t4]
    GROUP BY [t4].[value]
    ) AS [t5]
ORDER BY [t5].[value2]

This query is generated by a Linq2SQL expression and extracted from LINQPad. This produces a nice query plan (as far as I can tell) and executes in about 10 seconds on the database. However, if I replace the two uses of parameters with the exact value, that is replace the two '= @p0' parts with '= '1fc66e37-6eaf-4032-b374-e7b60fbd25ea' ' I get a different estimated query plan and the query now runs much longer (more than 60 seconds, haven't seen it through).

Why is it that performing the seemingly innocent replacement produces a much less efficient query plan and execution? I have cleared the procedure cache with 'DBCC FreeProcCache' to ensure that I was not caching a bad plan, but the behavior remains.

My real problem is that I can live with the 10 seconds execution time (at least for a good while) but I can't live with the 60+ sec execution time. My query will (as hinted above) by produced by Linq2SQL so it is executed on the database as

exec sp_executesql N'
        ...
        WHERE ([t0].[CookieID] IS NOT NULL) AND ([t0].[ProductID] = @p0)
        ...
        AND ([t2].[ProductID] = @p0)
        ...
       ',N'@p0 uniqueidentifier',@p0='1FC66E37-6EAF-4032-B374-E7B60FBD25EA'

which produces the same poor execution time (which I think is doubly strange since this seems to be using parameterized queries.

I'm not looking for advise on which indexes to create or the like, I'm just trying to understand why the query plan and execution are so dissimilar on three seemingly similar queries.

EDIT: I have uploaded execution plans for the non-parameterized and the parameterized query as well as an execution plan for a parameterized query (as suggested by Heinz) with a different GUID here

Hope it helps you help me :)

Solution

I'm not looking for advise on which indexes to create or the like, I'm just trying to understand why the query plan and execution are so dissimilar on three seemingly similar queries.

You seem to have two indexes:

IX_NonCluster_Config (ProductID, ServerTime)
IX_NonCluster_ProductID_CookieID_With_ServerTime (ProductID, CookieID) INCLUDE (ServerTime)

The first index does not cover CookieID but is ordered on ServerTime and hence is more efficient for the less selective ProductID's (i. e. those that you have many)

The second index does cover all columns but is not ordered, and hence is more efficient for more selective ProductID's (those that you have few).

In average, you ProductID cardinality is so that SQL Server expects the second method to be efficient, which is what it uses when you use parametrized queries or explicitly provide selective GUID's.

However, your original GUID is considered less selective, that's why the first method is used.

Unfortunately, the first method requires additional filtering on CookieID which is why it's less efficient in fact.

OTHER TIPS

If you provide an explicit value, SQL Server can use statistics of this field to make a "better" query plan decision. Unfortunately (as I've experienced myself recently), if the information contained in the statistics is misleading, sometimes SQL Server just makes the wrong choices.

If you want to dig deeper into this issue, I recommend you to check what happens if you use other GUIDs: If it uses a different query plan for different concrete GUIDs, that's an indication that statistics data is used. In that case, you might want to look at sp_updatestats and related commands.

EDIT: Have a look at DBCC SHOW_STATISTICS: The "slow" and the "fast" GUID are probably in different buckets in the histogram. I've had a similar problem, which I solved by adding an INDEX table hint to the SQL, which "guides" SQL Server towards finding the "right" query plan. Basically, I've looked at what indices are used during a "fast" query and hard-coded those into the SQL. This is far from an optimal or elegant solution, but I haven't found a better one yet...

My guess is that when you take the non paramaterized route, your guid has to be converted from a varchar to a UniqueIdentifier which may cause an index not to be used, while it will be used taking the paramatarized route.

I've seen this happen with using queries that have a smalldatetime in the where clause against a column that uses a datetime.

Its difficult to tell without looking at the execution plans, however if I was going to guess at a reason I'd say that its a combinaton of parameter sniffing and poor statistics - In the case where you hard-code the GUID into the query, the query optimiser attempts to optimise the query for that value of the parameter. I believe that the same thing happens with the parameterised / prepared query (this is called parameter sniffing - the execution plan is optimised for the parameters used the first time that the prepared statement is executed), however this definitely doesn't happen when you declare the parameter and use it in the query.

Like I said, SQL server attempt to optimise the execution plan for that value, and so usually you should see better results. It seems here that that information it is basing its decisions on is incorrect / misleading, and you are better off (for some reason) when it optimises the query for a generic parameter value.

This is mostly guesswork however - its impossible to tell really without the execution - if you can upload the executuion plan somewhere then I'm sure someone will be able to help you with the real reason.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow