MSSQL Join between big and empty tables

https://dba.stackexchange.com/questions/257532

22-02-2021
|

Question

I've got an issue with estimated cost and join predicate :

SELECT c.Id FROM Table_A a
LEFT JOIN Table_B b ON b.Id_A = a.Id
LEFT JOIN Table_C c ON c.Id_B = b.Id
WHERE a.Id = 2500
OPTION (RECOMPILE)

Table_A has 50k rows, Table_B is empty, Table_C has 2M rows The PK and FK indexes exists and the stats are up to date.

But for some reason, SQL Server use a Clustred Index Scan on Table_C. https://www.brentozar.com/pastetheplan/?id=r1G_LnpeU

This is a sample my problem, the prod tables are fare bigger and use too much estimated grant memory because of this issue.

Using a FORCESEEK or change the join to LEFT JOIN Table_C c ON c.Id_B = b.Id AND c.Id_B IS NOT NULL resolve the issue but it's a query (mutliple queries in fact) generate by Entity Framework so I don't have so much control on it.

Is there a way to seek Table_C without changing the query here ?

Solution

I was finally able to replicate your issue due to you noting that

LEFT JOIN Table_C c ON c.Id_B = b.Id AND c.Id_B IS NOT NULL

improved the query.

DDL & DML

CREATE TABLE Table_A (id int not null,
                     constraint ClusteredIndex_A primary key (id))

INSERT INTO dbo.Table_A(id)
SELECT top(50000) ROW_NUMBER() OVER(ORDER BY (SELECT NULL))
FROM MASTER..spt_values spt1
CROSS APPLY MASTER..spt_values spt2;


CREATE TABLE Table_B (id int not null,
                      Id_A int 
                     constraint ClusteredIndex_B primary key (id))


-- constraint FK_C_B FOREIGN KEY (id_B) REFERENCES Table_B(id)
CREATE TABLE Table_C (id int not null,
                      Id_B int 
                     constraint ClusteredIndex_C primary key (id),
                     constraint FK_C_B FOREIGN KEY (id_B) REFERENCES Table_B(id))
INSERT INTO dbo.Table_C(id,Id_B)
SELECT top(2000000) ROW_NUMBER() OVER(ORDER BY (SELECT NULL)),
NULL
FROM MASTER..spt_values spt1
CROSS APPLY MASTER..spt_values spt2;

CREATE INDEX [NonClusteredIndex-B-A] on dbo.Table_B(id_A);
CREATE INDEX [NonClusteredIndex-C-B] on dbo.Table_C(id_B);

And as a result, your query is going slow due to all the NULL values in the Table_C table.

SELECT c.Id FROM Table_A a 
LEFT JOIN Table_B b ON b.Id_A = a.Id 
LEFT  JOIN Table_C c ON c.Id_B = b.Id
WHERE a.Id = 2500 OPTION (RECOMPILE);

Resulting to a starting point close to yours:

The easiest & best solution here would be changing the query as you pointed out.

SELECT c.id
FROM Table_A a 
LEFT JOIN Table_B b ON b.Id_A = a.Id 
LEFT  JOIN Table_C c ON c.Id_B = b.Id and c.Id_B is not null
WHERE a.Id = 2500;

A possible but far from ideal solution would be adding OPTION(loop join) with a plan guide

exec sp_create_plan_guide   
@name = N'Guide_1',  
@stmt = N'SELECT c.id
FROM Table_A a 
LEFT JOIN Table_B b ON b.Id_A = a.Id 
LEFT JOIN Table_C c ON c.Id_B = b.Id 
WHERE a.Id = 2500;',  
@type = N'SQL',  
@module_or_batch = NULL,
@params = NULL,  
@hints = N'OPTION (LOOP JOIN)';

But the issue with high estimates on dbo.Table_C persists & plan guides are a last resort (if even).

Im gonna leave this up so other people can test & replicate your problem using DDL & DML for now. Maybe statistics or indexing could prove a solution that I was unable to find.

Edit

option(FAST 1) produces single row estimates & low cost (for this single example)

SELECT c.Id FROM Table_A a 
LEFT JOIN Table_B b ON b.Id_A = a.Id 
LEFT  JOIN Table_C c ON c.Id_B = b.Id
WHERE a.Id = 2500 OPTION (RECOMPILE,FAST 1);

Again not ideal since bigger resultsets will be impacted and other issues might arise.

What also works is inserting one row in dbo.Table_B

INSERT INTO dbo.Table_B(id,Id_A)
VALUES(1,NULL);

& update the stats UPDATE STATISTICS dbo.Table_B

So sql server does not have to join table_C to an empty table_B

Or insert -1.

INSERT INTO dbo.Table_B(id,Id_A)
VALUES(-1,NULL);

What you then could do, is generate the statistics with the 1 row, and create these on the empty dbo.Table_B.

By right clicking the database --> tasks --> generate scripts

Choose table_B

On advanced scripting options select script out statistics & histograms:

Save to new query window, next, next, finish.

And for me I got these statistics:

/****** Object:  Statistic [ClusteredIndex_B]    Script Date: 17/01/2020 10:58:36 ******/
UPDATE STATISTICS [dbo].[Table_B]([ClusteredIndex_B]) WITH STATS_STREAM = 0x0100000001000000000000000000000019B912CC000000007F020000000000003F02000000000000380300003800000004000A000000000000000000000000000700000046C8B30045AB000001000000000000000100000000000000000000000000803F000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000010000000100000014000000000080400000803F00000000000080400000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000190400000000000000000000000000001F00000000000000BB00000000000000C300000000000000CB000000000000000800000000000000100014000000803F000000000000803FFFFFFFFF04000002000000859CB10045AB0000000000000000F03F030000000000000001000000000000000000F03F00000000000000000000000000000000000000000000004000000000000000400000000000000000BF89B00045AB0000000000000000F03F010000000000000001000000000000000000F03F00000000000000000000000000000000000000000000F03F000000000000F03F000000000000F03F01000000000000000000000000000000, ROWCOUNT = 2, PAGECOUNT = 1
GO
/****** Object:  Statistic [NonClusteredIndex-B-A]    Script Date: 17/01/2020 10:58:36 ******/
UPDATE STATISTICS [dbo].[Table_B]([NonClusteredIndex-B-A]) WITH STATS_STREAM = 0x010000000200000000000000000000000ECB838500000000CC010000000000007401000000000000380200003800000004000A00000000000000000000000000380300003800000004000A00000000000000000000000000070000007721B40045AB000001000000000000000100000000000000000000000000803F0000803F0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000200000014000000000080400000803F0000803F000000000000804000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100400000000000000000000000000000800000000000000100000000000000001000000000000000000000000000000, ROWCOUNT = 1, PAGECOUNT = 1
GO

Deleted all data:

DELETE FROM dbo.Table_B

Reran the two update statistics with stat streams.

To get a better performing execution plan with correct estimates & results

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange