Question

We have a query that is taking around 5 sec on our production system, but on our mirror system (as identical as possible to production) and dev systems it takes under 1 second.

We have checked out the query plans and we can see that they differ. Also from these plans we can see why one is taking longer than the other. The data, schame and servers are similar and the stored procedures identical.

We know how to fix it by re-arranging the joins and adding hints, However at the moment it would be easier if we didn't have to make any changes to the SProc (Paperwork). We have also tried a sp_recompile.

What could cause the difference between the two query plans?

System: SQL 2005 SP2 Enterprise on Win2k3 Enterprise

Update: Thanks for your responses, it turns out that it was statistics. See summary below.

Was it helpful?

Solution

Your statistics are most likely out of date. If your data is the same, recompute the statistics on both servers and recompile. You should then see identical query plans.

Also, double-check that your indexes are identical.

OTHER TIPS

Most likely statistics.

Some thoughts: Do you do maintenance on your non-prod systems? (eg rebuidl indexes, which will rebuild statistics)

If so, do you use the same fillfactor and statistics sample ratio?

Do you restore the database regularly onto test so it's 100% like production?

is the data & data size between your mirror and production as close to the same as possible? If you know why one query taking longer then the other? can you post some more details?

Execution plans can be different in such cases because of the data in the tables and/or the statistics. Even in cases where auto update statistics is turned on, the statistics can get out of date (especially in very large tables) You may find that the optimizer has estimated a table is not that large and opted for a table scan or something like that.

Provided there is no WITH RECOMPILE option on your proc, the execution plan will get cached after the first execution.

Here is a trivial example on how you can get the wrong query plan cached:

create proc spTest
    @id int 
as 
select * from sysobjects where @id is null or id = id 

go 

exec spTest null
-- As expected its a clustered index scan

go

exec spTest 1
-- OH no its a clustered index scan 

Try running your Sql in QA on the production server outside of the stored proc to determine if you have an issue with your statistics being out of date or mysterious indexes missing from production.

Tying in to the first answer, the problem may lie with SQL Server's Parameter Sniffing feature. It uses the first value that caused compilation to help create the execution plan. Usually this is good but if the value is not normal (or somehow strange), it can contribute to a bad plan. This would also explain the difference between production and testing.

Turning off parameter sniffing would require modifying the SProc which I understand is undesirable. However, after using sp_recompile, pass in parameters that you'd consider "normal" and it should recompile based off of these new parameters.

I think the parameter sniffing behavior is different between 2005 and 2008 so this may not work.

The solution was to recalculate the statistics. I overlooked that as usually we have scheduled tasks to do all of that, but for some reason the admins didn't put one one this server, Doh.

To summarize all the posts:

  • Check the setup is the same
    • Indexes
    • Table sizes
    • Restore Database
  • Execution Plan Caching
    • If the query runs the same outside the SProc, it's not the Execution Plan
    • sp_recompile if it is different
    • Parameter sniffing
  • Recompute Statistics
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top