SQL Server Execution plan when combining multiple tables

https://stackoverflow.com/questions/21248535

30-09-2022
|

Question

I have a stored procedure that issues a query similar to the one below (pseudo-tsql).

Multiple ParentIds are passed in as a parameter (csv), parsed, and inserted into a table variable @i. For each ParentId passed in, we look up the StorageTable and include it in @i. Now, depending on the value of the StorageTable column, we need to fetch data out of the appropriate table (Table1, Table2, or Table3) by ParentId. There is no chance of duplicates across multiple tables - hence the UNION ALL.

When I examine the actual execution plan, I find that most of my cost/subtree cost (more than half) is spent on StorageTable's that were not even provided as inputs.

For example, if I included a StorageTable = 'Table1', Table2's Index Scan would show up with a high cost in the execution plan.

As I would expect, STATISTICS IO did not show any reads against Table2, but the data access points appear expensive according to the actual execution plan.

In my mind, if a particular StorageTable is not present, the inner join with @i will return an empty result set and "short circuit" any additional work, no?

What could be the solution?

DECLARE @i AS TABLE
(
    ParentId INT,
    StorageTable VARCHAR(10)
)

INSERT INTO @i...
INSERT INTO @i...
INSERT INTO @i...

SELECT Col1, Col2, Col3
FROM dbo.Table1 AS T1
INNER JOIN (SELECT * FROM @i WHERE StorageTable = 'Table1') AS I
    ON T1.ParentId = I.ParentId
<joins>
<where clause>

UNION ALL

SELECT Col1, Col2, Col3
FROM dbo.Table2 AS T2
INNER JOIN (SELECT * FROM @i WHERE StorageTable = 'Table2') AS I
    ON T2.ParentId = I.ParentId
<joins>
<where clause>

UNION ALL

SELECT Col1, Col2, Col3
FROM dbo.Table3 AS T3
INNER JOIN (SELECT * FROM @i WHERE StorageTable = 'Table3') AS I
    ON T3.ParentId = I.ParentId
<joins>
<where clause>

Solution

You should pretty much ignore the subtree costs in this case.

Even in the actual plan they are just based on estimates.

From what you say about STATISTICS IO output for example the actual number of executions of the operators accessing Table2 is 0.

However the plan will likely estimate Number of Executions = 1.

(You can see estimated and actual figures in the properties window in SSMS after selecting an operator)

If some branches of the plan have an under estimate of the number of executions you could try using a #temp table instead so the column statistics are taken into account.

You could get somewhat more representative subtree costs by adding some helper variables and OPTION (RECOMPILE) but still they are only as accurate as the modelling assumptions and the estimates.

For example

DECLARE @T TABLE(
  X            INT,
  StorageTable VARCHAR(50));

INSERT INTO @T
VALUES      (1, 'Table1')

DECLARE @Branch1Exists BIT = iif(EXISTS(SELECT * FROM @T WHERE StorageTable = 'Table1'), 1, 0)
DECLARE @Branch2Exists BIT = iif(EXISTS(SELECT * FROM @T WHERE StorageTable = 'Table2'), 1, 0)

SELECT X
FROM   @T
       JOIN master..spt_values V
         ON [@T].X = number
WHERE  @Branch1Exists = 1
UNION ALL
SELECT X
FROM   @T
       JOIN sys.objects
         ON [@T].X = object_id
WHERE  @Branch2Exists = 1
OPTION (recompile)

Removes at compile time the branch of the plan that isn't executed rather than showing costs for an estimated single execution.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow