Does SQL Server Only Perform Calculations In A SELECT List Once?
-
17-02-2021 - |
문제
Take the following example:
SELECT <CalculationA> As ColA,
<CalculationB> As ColB,
<CalculationA> + <CalculationB> As ColC
FROM TableA
Would CalculationA and CalculationB, each be calculated twice?
Or would the optimizer be clever enough to calculate them once and use the result twice?
I would like to perform a test to see the result for myself, however, I am not sure how I could check something like this.
My assumption is that it would perform the calculation twice.
In which case, depending upon the calculations involved, might it be better to use a derived table, or nested view? Consider the following:
SELECT TableB.ColA,
TableB.ColB,
TableB.ColA + TableB.ColB AS ColC,
FROM(
SELECT <CalculationA> As ColA,
<CalculationB> As ColB
FROM TableA
) As TableB
In this case, I would hope that the calculations would only be performed once?
Please can someone confirm or refute my assumptions? Or instruct me on how to test something like this for myself?
Thanks.
해결책
Most of the information you will need is going to be in the execution plan ( and the plan XML).
Take this query:
SELECT COUNT(val) As ColA,
COUNT(val2) As ColB,
COUNT(val) + COUNT(val2) As ColC
FROM dbo.TableA;
The execution plan (opened with sentryone plan explorer) shows what steps it went through:
With the stream aggregate aggregating the values for EXPR1005 & EXPR1006
If we want to know what these are, we could get the exact info on these expressions from the query plan XML:
<ColumnReference Column="Expr1005" />
<ScalarOperator ScalarString="COUNT([Database].[dbo].[TableA].[val])">
<Aggregate AggType="COUNT_BIG" Distinct="false">
With the first compute scalar computing ColA & ColB
:
And the last compute scalar being a simple addition:
This is reading it as the data flows, in theory you should be reading it from left to right if going over the logical execution.
In that case, EXPR1004
is calling the other expressions, EXPR1002
& EXPR1003
. In turn these are calling EXPR1005
& EXPR1006
.
Would CalculationA and CalculationB, each be calculated twice? Or would the optimizer be clever enough to calculate them once and use the result twice?
Previous tests show that in this case ColC
is simplified as an addition of the calculations that are defined as ColA
& ColB
.
As a result, ColA
& ColB
are only calculated once.
Grouping by 200 distinct values
If we are grouping by 200 distinct values (val3) the same is shown:
SET STATISTICS IO, TIME ON;
SELECT SUM(val) As ColA,
SUM(val2) As ColB,
SUM(val) + SUM(val2) As ColC
FROM dbo.TableA
GROUP BY val3;
Aggregating down to these 200 distinct values in val3
performing the sums on val & val2 and then adding them together for ColC:
Even if we are grouping on all but one non-unique value, the same addition should be seen for the compute scalar.
Adding a function to ColA & ColB
Even if we change the query to this:
SET STATISTICS IO, TIME ON;
SELECT ABS(SUM(val)) As ColA,
ABS(SUM(val2)) As ColB,
SUM(val) + SUM(val2) As ColC
FROM dbo.TableA
The aggregations are still not going to be calculated twice, we are simply adding the ABS()
function to the resultset of the aggregation, wich is one row:
Ofcourse, running SUM(ABS(ColA)
& SUM(ABS(ColB))
will make the optimizer unable to use the same expression for calculating ColC
.
If you want to go deeper in when this happens I would nudge you towards Query Optimizer Deep Dive - Part 1 (until Part 4) by Paul White.
Anther way to dive deeper into query execution phases is by adding these hints:
OPTION
(
RECOMPILE,
QUERYTRACEON 3604,
QUERYTRACEON 8605
);
This will expose the input tree as created by the optimizer.
The adding of the two previous calculated values to get ColC
is then translated to:
AncOp_PrjEl COL: Expr1004
ScaOp_Arithmetic x_aopAdd
ScaOp_Identifier COL: Expr1002
ScaOp_Identifier COL: Expr1003
This information is already present in the Input Tree, even before the simplification phase has taken place, showing that the optimizer immediately knows that it does not have to perform the same calculation twice.
다른 팁
If the first part of your calculation is an actual calculation (Col1 + Col2
) and not a function, then the individual calculations are performed for each "calculation" step.
SELECT <CalculationA> As ColA, <CalculationB> As ColB, <CalculationA> + <CalculationB> As ColC FROM TableA
If we replace <CalculationA>
from your statement with a valid calculation using ColA
and ColB
from a table, and repeat this for each subsequent <CalculationB>,...
step, then the actual task of calculating the result will be performed for each step individually.
To reproduce my statement paste the following code snippets into SQL Server Management Studio and run. Make sure you have turned on the option Include Actual Execution Plan.
It creates a database, a table, populates the table and performs a calculation which produces an execution plan.
CREATE DATABASE Q252661 GO USE Q252661 GO CREATE TABLE dbo.Q252661_TableA ( ColA INT, ColB INT, ColC INT, ColD INT) GO INSERT INTO Q252661_TableA ( ColA, ColB, ColC, ColD ) VALUES ( 1, 2, 3, 4 ),( 2, 4, 8, 16 ) GO SELECT ColA + ColB AS ColA, ColC + ColD AS ColB, ColA + ColB + ColC + ColD AS ColC FROM Q252661_TableA GO
The query will run and produce a graphical execution plan similar to the following:
Graphical Execution Plan of ADDing values
As in Randi's answer we'll focus on the Compute Scalar operator.
If you click on the Query Execution Plan in SSMS and right-click to show the actual plan:
.. you will find the following XML (focusing on the Compute Scalar portion):
<ComputeScalar>
<DefinedValues>
<DefinedValue>
<ColumnReference Column="Expr1003" />
<ScalarOperator ScalarString="[Q252661].[dbo].[Q252661_TableA].[ColA]+[Q252661].[dbo].[Q252661_TableA].[ColB]">
<Arithmetic Operation="ADD">
<ScalarOperator>
<Identifier>
<ColumnReference Database="[Q252661]" Schema="[dbo]" Table="[Q252661_TableA]" Column="ColA" />
</Identifier>
</ScalarOperator>
<ScalarOperator>
<Identifier>
<ColumnReference Database="[Q252661]" Schema="[dbo]" Table="[Q252661_TableA]" Column="ColB" />
</Identifier>
</ScalarOperator>
</Arithmetic>
</ScalarOperator>
</DefinedValue>
<DefinedValue>
<ColumnReference Column="Expr1004" />
<ScalarOperator ScalarString="[Q252661].[dbo].[Q252661_TableA].[ColC]+[Q252661].[dbo].[Q252661_TableA].[ColD]">
<Arithmetic Operation="ADD">
<ScalarOperator>
<Identifier>
<ColumnReference Database="[Q252661]" Schema="[dbo]" Table="[Q252661_TableA]" Column="ColC" />
</Identifier>
</ScalarOperator>
<ScalarOperator>
<Identifier>
<ColumnReference Database="[Q252661]" Schema="[dbo]" Table="[Q252661_TableA]" Column="ColD" />
</Identifier>
</ScalarOperator>
</Arithmetic>
</ScalarOperator>
</DefinedValue>
<DefinedValue>
<ColumnReference Column="Expr1005" />
<ScalarOperator ScalarString="[Q252661].[dbo].[Q252661_TableA].[ColA]+[Q252661].[dbo].[Q252661_TableA].[ColB]+[Q252661].[dbo].[Q252661_TableA].[ColC]+[Q252661].[dbo].[Q252661_TableA].[ColD]">
<Arithmetic Operation="ADD">
<ScalarOperator>
<Arithmetic Operation="ADD">
<ScalarOperator>
<Arithmetic Operation="ADD">
<ScalarOperator>
<Identifier>
<ColumnReference Database="[Q252661]" Schema="[dbo]" Table="[Q252661_TableA]" Column="ColA" />
</Identifier>
</ScalarOperator>
<ScalarOperator>
<Identifier>
<ColumnReference Database="[Q252661]" Schema="[dbo]" Table="[Q252661_TableA]" Column="ColB" />
</Identifier>
</ScalarOperator>
</Arithmetic>
</ScalarOperator>
<ScalarOperator>
<Identifier>
<ColumnReference Database="[Q252661]" Schema="[dbo]" Table="[Q252661_TableA]" Column="ColC" />
</Identifier>
</ScalarOperator>
</Arithmetic>
</ScalarOperator>
<ScalarOperator>
<Identifier>
<ColumnReference Database="[Q252661]" Schema="[dbo]" Table="[Q252661_TableA]" Column="ColD" />
</Identifier>
</ScalarOperator>
</Arithmetic>
</ScalarOperator>
</DefinedValue>
</DefinedValues>
<RelOp AvgRowSize="23" EstimateCPU="8.07E-05" EstimateIO="0.0032035" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="2" LogicalOp="Table Scan" NodeId="1" Parallel="false" PhysicalOp="Table Scan" EstimatedTotalSubtreeCost="0.0032842" TableCardinality="2">
<OutputList>
<ColumnReference Database="[Q252661]" Schema="[dbo]" Table="[Q252661_TableA]" Column="ColA" />
<ColumnReference Database="[Q252661]" Schema="[dbo]" Table="[Q252661_TableA]" Column="ColB" />
<ColumnReference Database="[Q252661]" Schema="[dbo]" Table="[Q252661_TableA]" Column="ColC" />
<ColumnReference Database="[Q252661]" Schema="[dbo]" Table="[Q252661_TableA]" Column="ColD" />
</OutputList>
<TableScan Ordered="false" ForcedIndex="false" ForceScan="false" NoExpandHint="false" Storage="RowStore">
<DefinedValues>
<DefinedValue>
<ColumnReference Database="[Q252661]" Schema="[dbo]" Table="[Q252661_TableA]" Column="ColA" />
</DefinedValue>
<DefinedValue>
<ColumnReference Database="[Q252661]" Schema="[dbo]" Table="[Q252661_TableA]" Column="ColB" />
</DefinedValue>
<DefinedValue>
<ColumnReference Database="[Q252661]" Schema="[dbo]" Table="[Q252661_TableA]" Column="ColC" />
</DefinedValue>
<DefinedValue>
<ColumnReference Database="[Q252661]" Schema="[dbo]" Table="[Q252661_TableA]" Column="ColD" />
</DefinedValue>
</DefinedValues>
<Object Database="[Q252661]" Schema="[dbo]" Table="[Q252661_TableA]" IndexKind="Heap" Storage="RowStore" />
</TableScan>
</RelOp>
</ComputeScalar>
So each individual calculation is performed again and again in the case of the values being retrieved from an actual table. The following XML fragment is from the above summary:
<ScalarOperator ScalarString="[Q252661].[dbo].[Q252661_TableA].[ColA]+[Q252661].[dbo].[Q252661_TableA].[ColB]">
<Arithmetic Operation="ADD">
There are five <Arithmetic Operation="ADD">
steps in the execution plan.
Answering your Question
Would CalculationA and CalculationB, each be calculated twice?
Yes, if the calculations are actual sums of columns as per the example. The last calculation would be the sum of CalculationA + CalculationB
.
Or would the optimizer be clever enough to calculate them once and use the result twice?
It depends on what you are calculating. - In this example: yes. - In Randi's answer: no.
My assumption is that it would perform the calculation twice.
You are right for certain calculations.
In which case, depending upon the calculations involved, might it be better to use a derived table, or nested view?
Correct.
Once you have finished you can drop your database again:
USE [master]
GO
DROP DATABASE Q252661
Since there are already good answers to the question I will focus on the DRY-aspect (DRY = don't repeat yourself).
I got used to use CROSS APPLY
, if I have to do the same calculations in the same query multiple times (don't forget GROUP BY / WHERE / ORDER BY
, where the same calcualtions tend to occure again and again and again).
SELECT calc.ColA,
calc.ColB,
calc.ColA + calc.ColB AS ColC
FROM TableA AS a
CROSS APPLY (SELECT a.Org_A * 100 AS ColA
, a.Org_B / 100 AS ColB
) AS calc
WHERE calc.ColB = @whatever
ORDER BY calc.ColA
When one calculation depends on another, there is no reason to not use multiple CROSS APPLY
calls to calculate interim results (same, when you need to use the end result in the WHERE
/ ORDER BY
)
SELECT calc1.ColA,
calc2.ColB,
calc3.ColC
FROM TableA AS a
CROSS APPLY (SELECT a.Org_A * 100 AS ColA) AS calc1
CROSS APPLY (SELECT calc1.ColA * 100 AS ColB) AS calc2
CROSS APPLY (SELECT calc1.ColA + calc2.ColB AS ColC) AS calc3
WHERE calc.ColB = @whatever
ORDER BY calc.ColA, calc3.ColC
The main point for doing this is that you have to modfiy / correct only one row of code when you find a bug or have to change something instead of multiple occurences and do not risk to have multiple (slightly different, since you forgot to change one) versions of the same calculation.
PS: regarding the readability CROSS APPLY
will usually win vs. several nested sub selects (or CTEs), particularly when the calculations use columns from different source tables or you have interim results.
If both Expressions are deterministic, the optimizer will reuse the previously calculated results and simply add them. If either expression is nondeterministic, that expression MUST be recalculated.