What Method / Formula does a Nested Loops Operator use for row estimation?

https://dba.stackexchange.com/questions/275222

07-03-2021
|

Question

The following, simple query in AdventureWorks:

SELECT  *
FROM    Person.Person p
        JOIN HumanResources.Employee e
            ON p.BusinessEntityID = e.BusinessEntityID

Gives the following execution plans:

New estimator plan

If I look at the above plan, I can see the index scan and index seek both (correctly) estimate 290 rows, however, the estimated loops operator that joins the two, estimates 279 rows.

Old estimator

The old estimator also correctly guesses 290 rows out of both the seek and the scan but the nested loops estimates 289 rows which in the case of this query is a better estimate.

Is it true then that in the case of the new CE the optimizer estimates that when it is joins 290 rows from the index scan and 290 from the index seek, there will be 11 rows that do not match?

What method / formula does it use to make this estimate?

Am I correct in saying whatever said method is, that it has changed from the earlier CE version as that has made a different estimate?

I realise the "bad" estimate of the new CE is not significant enough to detriment performance, I am just trying to understand the estimators processing

Solution

SQL Server estimates cardinality for joins, not physical operators. The inner join in question will have the same estimate regardless of the physical operator employed (hash, merge, nested loops, or apply). The physical operator may affect the display, but the selectivity of the logical join is the same.

With that out of the way, logical join estimation is still a complex topic. There are many valid ways to produce an estimate. Two main alternatives are well covered by Dimitry Piliugin in Join Estimation Internals in SQL Server. You can also find general differences between the original and latest CE models in the Microsoft paper Optimizing Your Query Plans with the SQL Server 2014 Cardinality Estimator.

One main difference occurs when using histograms to estimate join selectivity. The more recent CE model uses coarse alignment, as I describe in SQL Server Join Estimation using Histogram Coarse Alignment. The original CE used fine alignment with linear interpolation for each step, which can give more accurate estimates, but can also be more variable.

You may notice that using Simple Join (with trace flag 9479 given by Dima) gives a 'perfect' estimate for your test query.

There is a TF that forces the optimizer to use Simple Join algorithm even if a histogram is available. I will give you this one for the test and educational purposes.

In other (more common) cases, simple join will give a terrible result. Such is the nature of cardinality estimation.

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange