Question

Why is this query faster in SQL Server 2008 R2 (Version 10.50.2806.0)

    SELECT
        MAX(AtDate1),
        MIN(AtDate2)
    FROM
    (
        SELECT TOP 1000000000000
            at.Date1 AS AtDate1,
            at.Date2 AS AtDate2
        FROM
            dbo.tab1 a
        INNER JOIN
            dbo.tab2 at
        ON
            a.id = at.RootId
        AND CAST(GETDATE() AS DATE) BETWEEN at.Date1 AND at.Date2
        WHERE
            a.Number = 223889
    )B  

then

    SELECT
        MAX(AtDate1),
        MIN(AtDate2)
    FROM
    (
        SELECT 
            at.Date1 AS AtDate1,
            at.Date2 AS AtDate2
        FROM
            dbo.tab1 a
        INNER JOIN
            dbo.tab2 at
        ON
            a.id = at.RootId
        AND CAST(GETDATE() AS DATE) BETWEEN at.Date1 AND at.Date2
        WHERE
            a.Number = 223889
    )B  

?

The second statement with the TOP attribute is six times faster.

The count(*) of the inner subquery is 9280 rows.

Can I use a HINT to declare that SQL Server optimiser make it right? execution plan

Was it helpful?

Solution

I see you've now posted the plans. Just luck of the draw.

Your actual query is a 16 table join.

SELECT max(atDate1)  AS AtDate1,
       min(atDate2)  AS AtDate2,
       max(vtDate1)  AS vtDate1,
       min(vtDate2)  AS vtDate2,
       max(bgtDate1) AS bgtDate1,
       min(bgtDate2) AS bgtDate2,
       max(lftDate1) AS lftDate1,
       min(lftDate2) AS lftDate2,
       max(lgtDate1) AS lgtDate1,
       min(lgtDate2) AS lgtDate2,
       max(bltDate1) AS bltDate1,
       min(bltDate2) AS bltDate2
FROM   (SELECT TOP 100000 at.Date1  AS atDate1,
                          at.Date2  AS atDate2,
                          vt.Date1  AS vtDate1,
                          vt.Date2  AS vtDate2,
                          bgt.Date1 AS bgtDate1,
                          bgt.Date2 AS bgtDate2,
                          lft.Date1 AS lftDate1,
                          lft.Date2 AS lftDate2,
                          lgt.Date1 AS lgtDate1,
                          lgt.Date2 AS lgtDate2,
                          blt.Date1 AS bltDate1,
                          blt.Date2 AS bltDate2
        FROM   dbo.Tab1 a
               INNER JOIN dbo.Tab2 at
                 ON a.id = at.Tab1Id
                    AND cast(Getdate() AS DATE) BETWEEN at.Date1 AND at.Date2
               INNER JOIN dbo.Tab5 v
                 ON v.Tab1Id = a.Id
               INNER JOIN dbo.Tab16 g
                 ON g.Tab5Id = v.Id
               INNER JOIN dbo.Tab3 vt
                 ON v.id = vt.Tab5Id
                    AND cast(Getdate() AS DATE) BETWEEN vt.Date1 AND vt.Date2
               LEFT OUTER JOIN dbo.Tab4 vk
                 ON v.id = vk.Tab5Id
               LEFT OUTER JOIN dbo.VerkaufsTab3 vkt
                 ON vk.id = vkt.Tab4Id
               LEFT OUTER JOIN dbo.Plu p
                 ON p.Tab4Id = vk.Id
               LEFT OUTER JOIN dbo.Tab15 bg
                 ON bg.Tab5Id = v.Id
               LEFT OUTER JOIN dbo.Tab7 bgt
                 ON bgt.Tab15Id = bg.Id
                    AND cast(Getdate() AS DATE) BETWEEN bgt.Date1 AND bgt.Date2
               LEFT OUTER JOIN dbo.Tab11 b
                 ON b.Tab15Id = bg.Id
               LEFT OUTER JOIN dbo.Tab14 lf
                 ON lf.Id = b.Id
               LEFT OUTER JOIN dbo.Tab8 lft
                 ON lft.Tab14Id = lf.Id
                    AND cast(Getdate() AS DATE) BETWEEN lft.Date1 AND lft.Date2
               LEFT OUTER JOIN dbo.Tab13 lg
                 ON lg.Id = b.Id
               LEFT OUTER JOIN dbo.Tab9 lgt
                 ON lgt.Tab13Id = lg.Id
                    AND cast(Getdate() AS DATE) BETWEEN lgt.Date1 AND lgt.Date2
               LEFT OUTER JOIN dbo.Tab10 bl
                 ON bl.Tab11Id = b.Id
               LEFT OUTER JOIN dbo.Tab6 blt
                 ON blt.Tab10Id = bl.Id
                    AND cast(Getdate() AS DATE) BETWEEN blt.Date1 AND blt.Date2
        WHERE  a.Nummer = 223889) B

On both the good and bad plans the Execution Plan shows "Reason for Early Termination of Statement Optimization" as "Time Out".

The two plans have slightly different join orders.

The only join in the plans not satisfied by an index seek is that on Tab9. This has 63,926 rows.

The missing index details in the execution plan suggest that you create the following index.

CREATE NONCLUSTERED INDEX [miising_index]
ON [dbo].[Tab9] ([Date1],[Date2])
INCLUDE ([Tab13Id])

The problematic part of the bad plan can be clearly seen in SQL Sentry Plan Explorer

Bad Plan

SQL Server estimates that 1.349174 rows will be returned from the previous joins coming into the join on Tab9. And therefore costs the nested loops join as if it will need to execute the scan on the inside table 1.349174 times.

In fact 2,600 rows feed into that join meaning that it does 2,600 full scans of Tab9 (2,600 * 63,926 = 164,569,600 rows.)

It just so happens that on the good plan the estimated number of rows coming in to the join is 2.74319. This is still wrong by three orders of magnitude but the slightly increased estimate means SQL Server favors a hash join instead. A hash join just does one pass through Tab9

Good Plan

I would first try adding the missing index on Tab9.

Also/instead you might try updating the statistics on all tables involved (especially those with a date predicate such as Tab2 Tab3 Tab7 Tab8 Tab6) and see if that goes some way to correcting the huge discrepancy between estimated and actual rows on the left of the plan.

Also breaking the query up into smaller parts and materialising these into temporary tables with appropriate indexes might help. SQL Server can then use the statistics on these partial results to make better decisions for joins later in the plan.

Only as a last resort would I consider using query hints to try and force the plan with a hash join. Your options for doing that are either the USE PLAN hint in which case you dictate exactly the plan you want including all join types and orders or by stating LEFT OUTER HASH JOIN tab9 .... This second option also has the side effect of fixing all join orders in the plan. Both mean that SQL Server will be severely limited is its ability to adjust the plan with changes in data distribution.

OTHER TIPS

It's hard to answer not knowing the size and structure of your tables, and not being able to see the entire execution plan. But the difference in both plans is Hash Match join for "top n" query vs Nested Loop join for the other one. Hash Match is very resource intensive join, because the server has to prepare hash buckets in order to use it. But it becomes much more effective for big tables, while Nested Loops, comparing each row in one table to every row in another table works great for small tables, because there's no such preparation needed. What I think is that by selecting TOP 1000000000000 rows in subquery you give the optimizer a hint that you're subquery will produce a great amount of data, so it uses Hash Match. But in fact the output is small, so Nested Loops works better. What I just said is based on shreds of information, so please have heart criticising my answer ;).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top