Why doesn't join elimination work with sys.query_store_plan?

https://dba.stackexchange.com/questions/216485

08-01-2021
|

Question

The following is a simplification of a performance problem encountered with the Query Store:

CREATE TABLE #tears
(
    plan_id bigint NOT NULL
);

INSERT #tears (plan_id) 
VALUES (1);

SELECT
    T.plan_id
FROM #tears AS T
LEFT JOIN sys.query_store_plan AS QSP
    ON QSP.plan_id = T.plan_id;

The plan_id column is documented as being the primary key of sys.query_store_plan, but the execution plan does not use join elimination as would be expected:

No attributes are being projected from the DMV.
The DMV primary key plan_id cannot duplicate rows from the temporary table
A LEFT JOIN is used, so no rows from T can be eliminated.

Execution plan

Why is this, and what can be done to obtain join elimination here?

La solution

The documentation is a little misleading. The DMV is a non-materialized view, and does not have a primary key as such. The underlying definitions are a little complex but a simplified definition of sys.query_store_plan is:

CREATE VIEW sys.query_store_plan AS
SELECT
    PPM.plan_id
    -- various other attributes
FROM sys.plan_persist_plan_merged AS PPM
LEFT JOIN sys.syspalvalues AS P
    ON P.class = 'PFT' 
    AND P.[value] = plan_forcing_type;

Further, sys.plan_persist_plan_merged is also a view, though one needs to connect via the Dedicated Administrator Connection to see its definition. Again, simplified:

CREATE VIEW sys.plan_persist_plan_merged AS   
SELECT     
    P.plan_id as plan_id,    
    -- various other attributes
FROM sys.plan_persist_plan P 
    -- NOTE - in order to prevent potential deadlock
    -- between QDS_STATEMENT_STABILITY LOCK and index locks   
    WITH (NOLOCK) 
LEFT JOIN sys.plan_persist_plan_in_memory PM
    ON P.plan_id = PM.plan_id;

The indexes on sys.plan_persist_plan are:

╔════════════════════════╦══════════════════════════════════════╦═════════════╗
║       index_name       ║          index_description           ║ index_keys  ║
╠════════════════════════╬══════════════════════════════════════╬═════════════╣
║ plan_persist_plan_cidx ║ clustered, unique located on PRIMARY ║ plan_id     ║
║ plan_persist_plan_idx1 ║ nonclustered located on PRIMARY      ║ query_id(-) ║
╚════════════════════════╩══════════════════════════════════════╩═════════════╝

So plan_id is constrained to be unique on sys.plan_persist_plan.

Now, sys.plan_persist_plan_in_memory is a streaming table-valued function, presenting a tabular view of data only held in internal memory structures. As such, it does not have any unique constraints.

At heart, the query being executed is therefore equivalent to:

DECLARE @t1 table (plan_id integer NOT NULL);
DECLARE @t2 table (plan_id integer NOT NULL UNIQUE CLUSTERED);
DECLARE @t3 table (plan_id integer NULL);

SELECT 
    T1.plan_id
FROM @t1 AS T1 
LEFT JOIN
(
    SELECT 
        T2.plan_id
    FROM @t2 AS T2
    LEFT JOIN @t3 AS T3 
        ON T3.plan_id = T2.plan_id
) AS Q1
    ON Q1.plan_id = T1.plan_id;

...which does not produce join elimination:

Getting right to the core of the issue, the problem is the inner query:

DECLARE @t2 table (plan_id integer NOT NULL UNIQUE CLUSTERED);
DECLARE @t3 table (plan_id integer NULL);

SELECT 
    T2.plan_id
FROM @t2 AS T2
LEFT JOIN @t3 AS T3 
    ON T3.plan_id = T2.plan_id;

...clearly the left join might result in rows from @t2 being duplicated because @t3 has no uniqueness constraint on plan_id. Therefore, the join cannot be eliminated:

To workaround this, we can explicitly tell the optimizer that we do not require any duplicate plan_id values:

DECLARE @t2 table (plan_id integer NOT NULL UNIQUE CLUSTERED);
DECLARE @t3 table (plan_id integer NULL);

SELECT DISTINCT
    T2.plan_id
FROM @t2 AS T2
LEFT JOIN @t3 AS T3 
    ON T3.plan_id = T2.plan_id;

The outer join to @t3 can now be eliminated:

Applying that to the real query:

SELECT DISTINCT
    T.plan_id
FROM #tears AS T
LEFT JOIN sys.query_store_plan AS QSP
    ON QSP.plan_id = T.plan_id;

Equally, we could add GROUP BY T.plan_id instead of the DISTINCT. Anyway, the optimizer can now correctly reason about the plan_id attribute all the way down through the nested views, and eliminate both outer joins as desired:

Note that making plan_id unique in the temporary table would not be sufficient to obtain join elimination, since it would not preclude incorrect results. We must explicitly reject duplicate plan_id values from the final result to allow the optimizer to work its magic here.

Licencié sous: CC-BY-SA avec attribution

Non affilié à dba.stackexchange