WHERE Clause Compare VarChar(n) Column = VarChar(MAX) Variable Results in Inequality Comparison and Poor Query Performance
-
11-03-2021 - |
Question
Earlier today I was working on a Reporting Stored Procedure
which had poor performance. I was able to fix the issue by changing a static variable
which held a string of ten characters or less from VARCHAR(MAX)
to VARCHAR(10)
. I am looking to understand why that would have made a difference.
Additional Details If Needed
The Original query is much bigger than what needs to be looked at but I saw the below things happening, when joining the foundational table
to a Key-Value Pair
table
we were returning a ton more rows
than was actually needed (in some instances over 14 billion rows for a final output of ~60,000 rows) and we had a few extra operations (particularly the Lazy Table Spool
). Image of that part of the Query Plan is below:
I could see a weird part of the execution plan was how the @ProgramID
variable
was being handled. This variable is used as [Column] = @ProgramID
and the Column
is part of the tables Non-Clustered Index
and is a VarChar(10)
. But we are still doing an Inequality Search
instead of an Equality Search
.
After modifying this parameter/variable
from VarChar(Max)
to VarChar(10)
. The execution time
dropped from several hours to about 30 seconds and the plan had a very different structure. Looking at a similar snippit from the execution plan, we were not returning nearly as many rows, we don't have the extra operations and we were doing an equality search
on [Column] = @ProgramID
(again Column
is part of the tables Non-Clustered Index
and is a VarChar(10)
).
Below is a simplified query which shows some of this behavior (we don't get all the extra steps, but the Seek Predicate difference is still there): (Paste The Plan)
DECLARE @CompanyID VarChar(10) = 'RxCRoads'
,@ClientID VarCHar(10) = 'Amgen'
,@ProgramID VarChar(MAX) = 'Foundation'
,@StartDate DATETIME = '2020-01-01'
,@EndDate DATETIME = '2020-12-07'
,@RxOnly INT = 9
SELECT PC.CompanyID,
PC.ClientID,
PC.ProgramID,
PC.PatientID,
PC.CaseID,
RxOnly.[Value] AS RxOnly
FROM PATIENTCASES PC
LEFT OUTER JOIN PatientCaseDetail RxOnly
ON RxOnly.CompanyId = PC.CompanyID
AND RxOnly.ClientId = PC.ClientID
AND RxOnly.ProgramId = PC.ProgramID
AND RxOnly.PatientID = PC.PatientID
AND RxOnly.CaseID = PC.CaseID
AND RxOnly.PatientCaseAdditionalElementId = @RxOnly
WHERE PC.CompanyID = @CompanyID
AND PC.ClientID = @ClientID
AND PC.ProgramID = @ProgramID
AND PC.CaseCreateDateTime >= @StartDate
AND PC.CaseCreateDateTime < @EndDate
Just changing the @ProgramID VarChar(MAX) = 'Foundation')
to @ProgramID VarChar(10) = 'Foundation'
returns a more streamline plan and better performance (even though not really noticeable in this particular query). (Paste The Plan)
Is there any particular reason why this change in the size of the data type would result in the change in the execution plan? Running the same query with @ProgarmId VarChar(8000)
instead of @Program VarChar(10)
has no noticeable difference from @Program VarChar(10)
. (Paste The Plan)
My guess is that VarChar(MAX)
has different comparison rules due to its size, and this is just a state of what VarChar(MAX)
is. But I didn't know if anyone had a better/more technical answer than I did so I can better apply this lesson in the future.
Solution
The Problems
You have two things working against you:
- Local variables often have weird side effects
- MAX datatypes can't be pushed, so you end up with late filters in your query plans
Some further reading:
- Yet Another Post About Local Variables
- MAX Data Types Do WHAT?
- Why You Shouldn’t Ignore Filter Operators In Query Plans Part 1 | Part 2
- Why does SQL Server use a better execution plan when I inline the variable?
In Particular
In your worse plan, there are two filter operators:
Each one is there to deal with the MAX type on @ProgramId
. The length of the parameter doesn't matter much until you hit the max type, so long as it's the correct type, e.g. varchar = varchar
or nvarchar = nvarchar
You may also see GetRangeThroughMismatchedType
or GetRangeThroughConvert
there to deal with a date
\ datetime
mismatch:
You may find some overall relief by adding a recompile hint, but it won't fix the issue you're having around the MAX variable.
OTHER TIPS
Length is one of the things used by the optimizer to determine the optimal execution plan.
When using MAX, If SQL Server wants to use an index, it has to convert equal your variable to the column.
We can see Compute Scalar
operator on your execution plan. It is a lightweight operator but using Compute Scalar
with Nested Loop
will be worse performance. Because Nested Loop
goes to Compute Scalar
operation for every row come to.
<ComputeScalar>
<DefinedValues>
<DefinedValue>
<ValueVector>
<ColumnReference Column="Expr1003"/>
<ColumnReference Column="Expr1004"/>
<ColumnReference Column="Expr1002"/>
</ValueVector>
<ScalarOperator ScalarString="GetRangeWithMismatchedTypes([@ProgramID],[@ProgramID],(62))">
<Intrinsic FunctionName="GetRangeWithMismatchedTypes">
<ScalarOperator>
<Identifier>
<ColumnReference Column="@ProgramID"/>
</Identifier>
</ScalarOperator>
<ScalarOperator>
<Identifier>
<ColumnReference Column="@ProgramID"/>
</Identifier>
</ScalarOperator>
<ScalarOperator>
<Const ConstValue="(62)"/>
</ScalarOperator>
</Intrinsic>
</ScalarOperator>
</DefinedValue>
</DefinedValues>
<RelOp AvgRowSize="0" EstimateCPU="0" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="1" LogicalOp="Constant Scan" NodeId="4" Parallel="false" PhysicalOp="Constant Scan" EstimatedTotalSubtreeCost="0">
<OutputList/>
<RunTimeInformation>
<RunTimeCountersPerThread Thread="0" ActualRows="1" Batches="0" ActualEndOfScans="1" ActualExecutions="1" ActualExecutionMode="Row" ActualElapsedms="0" ActualCPUms="0"/>
</RunTimeInformation>
<ConstantScan/>
</RelOp>
</ComputeScalar>
VARCHAR(N)
and VARCHAR(MAX)
kinda two different data types like DateTime
and Date
. VARCHAR(MAX)
is BLOB
and VARCHAR(n)
In-Row data
. So, if your column type does not VARCHAR(MAX), SQL Server needs to convert your variable for using the index.
There is a deep dive blog post from Paul White. But, doesn't include your example.