Same queries, different servers, different exec plans, stats are up to date

https://dba.stackexchange.com/questions/9234

16-10-2019
|

Question

I've got the same query running on two different servers, and getting vastly different performances. I have updated all of the stats on dependent objects and it has not fixed the issue. I'm lost on where to turn to next.

Here's what the plans look like, the top beign the dex execution (the one that works), and the bottom being the test execution (the slow one).

Exec Plan

edit: here's what the sql looks like:

ALTER procedure [ardb].[NewProjects] 
    -- Add the parameters for the stored procedure here
as 
begin
    -- SET NOCOUNT ON added to prevent extra result sets from
    -- interfering with SELECT statements.
    set NOCOUNT on ;

    -- Insert statements for procedure here



;with b as
(
    select distinct
        coalesce(a.STUDY_NUMBER,b.STUDY_NUMBER, c.study_number) as StudyNumber
    from
    (
        SELECT     SUBSTRING(PAPROJNUMBER, 1, 5) AS STUDY_NUMBER
        FROM         appdb.PROD.dbo.PA01201 AS PA01201_1
        where PAPROJNUMBER>'0'
            and PAPROJNUMBER not like '%[a-z]%'
    ) a
        full outer join
        (
            SELECT     SUBSTRING(PAPROJNUMBER, 1, 5) AS STUDY_NUMBER
            FROM        appdb.MO1.dbo.PA01201 AS PA01201_1
            where PAPROJNUMBER>'0'
                and PAPROJNUMBER not like '%[a-z]%'
        ) b
            on a.STUDY_NUMBER=b.STUDY_NUMBER
        full outer join 
        (
            SELECT     STUDY_NUMBER
            FROM         appdb.ptwdb.dbo.tblARCompletedStudies
            where STUDY_NUMBER>'0'
        ) c
            on a.STUDY_NUMBER=c.study_number
                and b.STUDY_NUMBER=c.study_number
)

select
    snStudyNumber as ProjectNumber,
    c.clName as CompanyName,
    q.quQuoteNumber as QuoteNumber,
    q.quQuoteID as QuoteID,
    q.quDateWon as DateWon,
    s.snPTWCompletionDate as PTWCompletionDate
from CDB.cdb.Quotes q
    inner join CDB.cdb.LineItems l
        on q.PK_quID=l.FK_quID
    inner join CDB.cdb.StudyNumbers s
        on l.FK_snID=s.PK_snId
    inner join cdb.cdb.Clients c
        on q.FK_clID=c.PK_clID
    left outer join ardb.projects p
        on s.snStudyNumber=p.prProjectNumber
    left outer join b
        on s.snStudyNumber=b.studynumber
where q.quDateWon>''
    and s.snPTWCompletionDate >''
    and p.PK_prID is null
    and l.liCreationDate>'12/31/06'
    and b.studynumber is null
union all

select CAST(ProjectNumber AS int) as ProjectNumber,
    null as CompanyName,
    null as QuoteNumber,
    null as QuoteID,
    null as DateWon,
    null as PTWCompletionDate
from history.ProjectsToProcess a
    left outer join ardb.projects p
        on a.ProjectNumber=p.prProjectNumber
where p.PK_prID is null

order by ProjectNumber desc





end

xml plans: here

Solution

In the slow plan, the plan sub-tree that contains all the remote queries is executed 2928 times. It would have been 3632 times (the number of rows on the outer input to the nested loops join that has the remote query sub-tree on its inner side), but the Distinct Sort is able to rewind (reuse) the prior iteration's results on a number of occasions.

In the fast plan, the join is a hash join rather than nested loops, so the problematic sub-tree is only executed once. As usual, the cause is likely the small differences in cardinality estimation (visible in the plan as estimated row counts) and derived histograms and distribution statistics (not so visible). This variation is probably due to slightly different samples taken to build statistics, or statistics built at different times between dev and test.

To quickly validate this, you could add OPTION (HASH JOIN) to the query. There is a small risk that the optimizer will complain that it cannot produce a plan with that hint. I haven't analysed the plan deeply enough to predict that, and this suggestion is just for demonstration purposes. It is not my suggested solution.

A better fix is to break the query into sections. Refactor the remote query sub-tree processing into a separate query, and store the 6737 rows it produces in a temporary table. You will probably want to use a #temporary table rather than a table variable, since you will get auto-generated statistics, and you will have a wider range of indexing possibilities, should that prove to be useful.

Queries with many joins and remote queries have a very large number of possible arrangements. By breaking the query up sensibly, and providing statistics (and perhaps indexes) you are giving the optimizer better information and a smaller search space to explore. Your chances of a good plan go up commensurately.

OTHER TIPS

It looks like your issue may be with the Remote Query Showplan Operators. They are the most expensive, pegging out at 38.9% and 14.5% for the slow queries. Can you give us a bit more information as to the source of those queries and the execution based on them?

Run a trace with SQL Profiler to see where the longest durations are on the slow query, and compare them to that of the faster query. You'll be able to pinpoint the statements/procedures that are the most expensive and that'll narrow down your troubleshooting.

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange