Why does SQL Server not perform constant (UNION ALL) branch elimination with OPTION(RECOMPILE) when selecting the result into a scalar variable?

dba.stackexchange https://dba.stackexchange.com/questions/286705

Question

We use some 'aggregate' views to select from multiple tables using a discriminator (note: these views are not partitioned views, because the discriminator is not in the base tables). This normally works well when using option(recompile), as the query planner will eliminate the non-reachable union all paths before selection of a query plan.

However, this constant-folding optimization appears defeated when selecting the result into a scalar variable. Selecting the result into a temporary table variable does not de-optimize the recompilation.

Here is a reproduction case in SQL Server 2017:

-- A table, don't need any data.
create table [test].test_table (col1 int, primary key (col1));

-- A simple 'aggregate' view. Using the same table here is irrelevant and,
-- while the view shows the scenario, it might not be required to reproduce the issue.
create view [test].test_view as
select col1, descrim = 1 from [test].test_table
union all
select col1, descrim = 2 from [test].test_table

Normal query, which results in an optimized query plan touching only one of the union all branches:

declare @descrim int = 2;

select count(col1)
from [test].test_view
where descrim = @descrim
option (recompile) -- explicit recompile here "works"

However, as soon as a "select into scalar variable" is used, the plan becomes de-optimized as it does not eliminate the non-used union. (The plan is still correctly optimized when using a literal value in the query text.)

declare @descrim int = 2;
declare @brokeit int;

select @brokeit = count(col1)
from [test].test_view
where descrim = @descrim
option (recompile) -- explicit recompile here does NOT optimize plan for @descrim!

1. Is this de-optimization "expected"?

2. Where is this significant de-optimization behavior with respect to option(recompile) and/or selecting into a scalar variable documented or otherwise discussed in depth?

3. Is there an simple way to get an recompile-optimized plan with select @x = .. without using a temporary table (variable)?

While during query execution the union all will prevent actual IO access to the secondary artifact, such is still an issue with query plan generation. In the specific error case spawning this question, leaving in multiple tables for consideration prevents SQL Server from choosing an appropriate seek plan and the resulting plan options are very poor choices in the given domain.

The first "good" plan:

enter image description here

The second and "bad" plan:

enter image description here

This "bad" plan also has an implicit conversion warning, making me suspect that the select into a scalar variable might be bypassing many different optimizations - or even ignoring the option(recompile) hint entirely.

Was it helpful?

Solution

Constant folding has a particular meaning in SQL Server. It is not directly involved in your question. The features that combine to produce extensive simplification of the execution plan for you are the parameter embedding optimization (PEO) and contradiction detection.

PEO embeds the literal value of e.g. a parameter or local variable into the query text, in circumstances where it is safe to do so. One of the requirements is that OPTION (RECOMPILE) is specified. This guarantees that the plan generated will never be reused, so it may be safe to replace non-literals with sniffed literals.

The OPTION (RECOMPILE) hint itself only provides that a new plan will be generated on each execution, the sniffed values of any parameters will be used for cardinality estimation, and the one-off plan generated will not be cached for reuse after execution.

PEO was first added to the product in SQL Server 2008, though it was disabled shortly afterwards due to the possibility of incorrect results. It was re-enabled in SQL Server 2008 SP1 CU5 (Microsoft blog post).

A query optimized with PEO applied appears to the query optimizer exactly as if the query had been written with literals instead of parameters or variables. Contradiction detection can remove entire clauses or relational operator subtrees where literal expressions like WHERE 0 = 1 appear. This facility exists because automated tools often generate such SQL.

It is not always safe to apply PEO, but the exceptions are not officially documented. One exception is where variable assignment occurs (others exist, such as where the parameter appears in an OPTIMIZE FOR clause). My understanding is that variable assignment involves a good deal of complex legacy behaviour, with occasionally odd semantics, preserved for backwards compatibility reasons. It would be impractical to guarantee that PEO would operate correctly in all circumstances, so it is disabled for that case.

PEO is an opportunistic facility that goes beyond the documented behaviour of OPTION (RECOMPILE). It can deliver significant performance benefits in many cases, but it is not officially documented. One might usefully regard it as a bonus feature - nice when you get it, but there are no refunds in case of disappointment.

In your example, where PEO cannot be applied, elimination of subtree execution is provided by start-up filters (which are documented). The Filter operators shown in the "unoptimized" plan are start-up filters that only execute their subtree if the start-up predicate evaluates to true.

The lack of a "seek plan", in the PEO context, is often due to optimizations that can only be performed (either due to safety issues or implementation limitations) when a literal value is present. This literal may appear in the original text, or it may have been substituted in via PEO. An example of this is the optimization rule SelOnSeqPrj, which allows a predicate to move past a sequence function like ROW_NUMBER when safe, but only when a literal value is available (see this old Stack Overflow answer of mine).

A SQL Server 2017 repro of the code in the question does not produce the extra Compute Scalar and implicit conversion mentioned in the question. It appears the query used to produce that plan was different to that given in the question. Or perhaps the instance or database has some important configuration or option that was not specified. In any case, I was unable to reproduce it.

The OPTION (RECOMPILE) hint is never ignored. The only implicit conversion in the query is from the internal bigint result of COUNT(*) to integer as required by COUNT (as opposed to COUNT_BIG).

Depending on the requirements and restrictions of the real application behind the question, you may need to employ dynamic SQL, or some other solution. Feel free to ask a new question about potential solutions to the underlying problem, if it can be expressed in a way that is suitable for our Q & A format.


The brief answers to your questions are:

  1. Yes.
  2. I cover it in Parameter Sniffing, Embedding, and the RECOMPILE Options.
  3. No.

OTHER TIPS

You can SELECT INTO a temp table and assign your variable from that as a workaround:

declare @descrim int = 2;
declare @brokeit int;

select count(col1) c
into #results
from [test].test_view
where descrim = @descrim
option (recompile)
    
select @brokeit = c from #results 
Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top