Question

We have a production stored procedure that executes in 2-3 seconds everywhere except one client's environment.

Their environment appears healthy with 24 cores, 64 GB RAM and is nowhere near capacity. The SQL Server is 2008 r2 SP2.

I have restored the DB in my testing environment and the sproc returns in 2 seconds, but takes 20-50 minutes in the client's environment.

Today I setup a new instance of SQL Server on the same server and it also takes 20-50 minutes to execute the stored procedure.

Our DBA found the problem statement and devised a workaround.

Original:

--This returns in 30 minutes.
SELECT DISTINCT
          P.ProjectID, P.ProjectName
FROM DocumentRoute DR

LEFT JOIN Routes R
ON R.RouteID = DR.RouteID

INNER JOIN Documents D
ON D.DocumentID = DR.DocumentID
AND D.Status = 1

INNER JOIN Files F
ON F.FileID = D.FileID
AND F.Status = 1

INNER JOIN Projects P
ON P.ProjectID = F.ProjectID
AND P.Status = 1

LEFT OUTER JOIN Users U
ON U.UserID = DR.UserID

WHERE DR.Status = 1

Original Execution Plan here: http://screencast.com/t/xGcRIE9o

Workaround:

--This returns in 2 seconds.
SELECT DISTINCT
          P.ProjectID, P.ProjectName
FROM DocumentRoute DR

LEFT JOIN Routes R
ON R.RouteID = DR.RouteID
AND DR.Status = 1

INNER JOIN Documents D
ON D.DocumentID = DR.DocumentID
AND D.Status = 1

INNER JOIN Files F
ON F.FileID = D.FileID
AND F.Status = 1

INNER JOIN Projects P
ON P.ProjectID = F.ProjectID
AND P.Status = 1

LEFT OUTER JOIN Users U
ON U.UserID = DR.UserID

Revised Execution Plan here: http://screencast.com/t/Fqg90w6NDyZd

What in the client's environment could possibly account for the massive difference in execution time?

Additional info: When I got the execution plans from the client, the problem statement by itself finished in 17 seconds, but the entire sproc has been running for 15+ minutes and will probably take another 15 to finish.

Was it helpful?

Solution 2

With the advice of @ErikE, I scrutinized the client's execution plans for the multiple statements within the stored procedure and I noticed several parallelism operators.

So knowing that they had 24 processors on the server vs. 1 on mine, the light bulb lit up and I decided to try MAXDOP of 1 and behold, the execution time went from 45 minutes to 2 seconds.

sp_configure 'show advanced options', 1;
RECONFIGURE WITH OVERRIDE;
GO

sp_configure 'max degree of parallelism', 1;
GO
RECONFIGURE WITH OVERRIDE

OTHER TIPS

The first problem is that these two queries do not have the same meaning and can return different results. The first query will only produce DocumentRoute rows where Status = 1. The second query will produce all DocumentRoute rows, and where the Status is null or is not equal to 1, will not perform the join to Routes.

The second problem is that if you are just selecting columns from the Projects table using DISTINCT, the LEFT JOINs can't possibly change the query in any way--so you may as well remove them.

Finally, without giving us some idea of the execution plan for both queries, and possibly some more details about the structure of the tables involved, no one is going to be able to definitively give you an answer about what is going on. Furthermore, the execution plan within your environment (where it takes 2 seconds) is not going to be helpful. We have to know the execution plan in the environment where it is running slowly.

To get an execution plan, run SET STATISTICS XML ON; first, run the desired query, and see the execution plan shown in a result set given after the result set of the query itself.

Some thoughts on what could be causing the problem:

  • Are statistics set to update automatically? If not, the server could choose a poor plan.
  • What is the fragmentation of the tables involved in the query?
  • Does the client database server have its tempdb files set up properly--if there is more than one, are they all the same size?
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top