Same queries, different servers, different exec plans, stats are up to date
-
16-10-2019 - |
Question
I've got the same query running on two different servers, and getting vastly different performances. I have updated all of the stats on dependent objects and it has not fixed the issue. I'm lost on where to turn to next.
Here's what the plans look like, the top beign the dex execution (the one that works), and the bottom being the test execution (the slow one).
edit: here's what the sql looks like:
ALTER procedure [ardb].[NewProjects]
-- Add the parameters for the stored procedure here
as
begin
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
set NOCOUNT on ;
-- Insert statements for procedure here
;with b as
(
select distinct
coalesce(a.STUDY_NUMBER,b.STUDY_NUMBER, c.study_number) as StudyNumber
from
(
SELECT SUBSTRING(PAPROJNUMBER, 1, 5) AS STUDY_NUMBER
FROM appdb.PROD.dbo.PA01201 AS PA01201_1
where PAPROJNUMBER>'0'
and PAPROJNUMBER not like '%[a-z]%'
) a
full outer join
(
SELECT SUBSTRING(PAPROJNUMBER, 1, 5) AS STUDY_NUMBER
FROM appdb.MO1.dbo.PA01201 AS PA01201_1
where PAPROJNUMBER>'0'
and PAPROJNUMBER not like '%[a-z]%'
) b
on a.STUDY_NUMBER=b.STUDY_NUMBER
full outer join
(
SELECT STUDY_NUMBER
FROM appdb.ptwdb.dbo.tblARCompletedStudies
where STUDY_NUMBER>'0'
) c
on a.STUDY_NUMBER=c.study_number
and b.STUDY_NUMBER=c.study_number
)
select
snStudyNumber as ProjectNumber,
c.clName as CompanyName,
q.quQuoteNumber as QuoteNumber,
q.quQuoteID as QuoteID,
q.quDateWon as DateWon,
s.snPTWCompletionDate as PTWCompletionDate
from CDB.cdb.Quotes q
inner join CDB.cdb.LineItems l
on q.PK_quID=l.FK_quID
inner join CDB.cdb.StudyNumbers s
on l.FK_snID=s.PK_snId
inner join cdb.cdb.Clients c
on q.FK_clID=c.PK_clID
left outer join ardb.projects p
on s.snStudyNumber=p.prProjectNumber
left outer join b
on s.snStudyNumber=b.studynumber
where q.quDateWon>''
and s.snPTWCompletionDate >''
and p.PK_prID is null
and l.liCreationDate>'12/31/06'
and b.studynumber is null
union all
select CAST(ProjectNumber AS int) as ProjectNumber,
null as CompanyName,
null as QuoteNumber,
null as QuoteID,
null as DateWon,
null as PTWCompletionDate
from history.ProjectsToProcess a
left outer join ardb.projects p
on a.ProjectNumber=p.prProjectNumber
where p.PK_prID is null
order by ProjectNumber desc
end
xml plans: here
Solution
In the slow plan, the plan sub-tree that contains all the remote queries is executed 2928 times. It would have been 3632 times (the number of rows on the outer input to the nested loops join that has the remote query sub-tree on its inner side), but the Distinct Sort is able to rewind (reuse) the prior iteration's results on a number of occasions.
In the fast plan, the join is a hash join rather than nested loops, so the problematic sub-tree is only executed once. As usual, the cause is likely the small differences in cardinality estimation (visible in the plan as estimated row counts) and derived histograms and distribution statistics (not so visible). This variation is probably due to slightly different samples taken to build statistics, or statistics built at different times between dev and test.
To quickly validate this, you could add OPTION (HASH JOIN)
to the query. There is a small risk that the optimizer will complain that it cannot produce a plan with that hint. I haven't analysed the plan deeply enough to predict that, and this suggestion is just for demonstration purposes. It is not my suggested solution.
A better fix is to break the query into sections. Refactor the remote query sub-tree processing into a separate query, and store the 6737 rows it produces in a temporary table. You will probably want to use a #temporary table rather than a table variable, since you will get auto-generated statistics, and you will have a wider range of indexing possibilities, should that prove to be useful.
Queries with many joins and remote queries have a very large number of possible arrangements. By breaking the query up sensibly, and providing statistics (and perhaps indexes) you are giving the optimizer better information and a smaller search space to explore. Your chances of a good plan go up commensurately.
OTHER TIPS
It looks like your issue may be with the Remote Query Showplan Operators. They are the most expensive, pegging out at 38.9% and 14.5% for the slow queries. Can you give us a bit more information as to the source of those queries and the execution based on them?
Run a trace with SQL Profiler to see where the longest durations are on the slow query, and compare them to that of the faster query. You'll be able to pinpoint the statements/procedures that are the most expensive and that'll narrow down your troubleshooting.