Reused Nested Decision Logic - CTE vs Copying Code
-
05-02-2021 - |
Question
Question
I have a query which outputs one column
, created through a series of CASE
statements. That same column
is used as part of a CASE
logic for a 2nd column
in the same SELECT
statement.
If I were to construct a CTE
with the inner logic applied so I can reference the inner logic when I have to use it as a decision later down the road. What is the overall additional overhead?
As I understand it there is not any real overhead added. (Some research and a simple test case I used are below). Are there articles that talk about this or instances where this is not the case?
Research and Simple Test Case
I found a couple of articles which don't indicate this specific question but send me in the direction of there isn't operational overhead added.
- https://www.scarydba.com/2016/07/18/common-table-expression-just-a-name/
- https://www.sqlshack.com/why-is-my-cte-so-slow/
I wrote a small query against one of our existing databases to test this theory out. The results, and execution times were the same, as well as The statistics
and the Query Execution Plan
.
Code
, Execution Plan
and Statistics
for Non-CTE
version:
SELECT PC.CompanyID,
PC.ClientID,
PC.ProgramID,
PC.PatientID,
PC.CaseID,
CASE
WHEN PFH.FulFilHdrCreateDateTime IS NULL
THEN PC.CaseCreateDateTime
ELSE
PFH.FulFilHdrCreateDateTime
END AS [ImportantDate],
DATEDIFF(Day, CASE WHEN PFH.FulFilHdrCreateDateTime IS NULL THEN PC.CaseCreateDateTime ELSE PFH.FulFilHdrCreateDateTime END, GETDATE())
FROM PATIENTCASES PC
LEFT OUTER JOIN PATFULFILLMENTHEADER PFH
ON PFH.CompanyID = PC.CompanyID
AND PFH.ClientID = PC.ClientID
AND PFH.ProgramID = PC.ProgramID
AND PFH.PatientID = PC.PatientID
AND PFH.CaseID = PC.CaseID
AND PFH.FulFilHdrID = (SELECT TOP(1) temp.FulFilHdrID
FROM PATFULFILLMENTHEADER temp
WHERE temp.CompanyID = PC.CompanyID
AND temp.ClientID = PC.ClientID
AND temp.ProgramID = PC.ProgramID
AND temp.PatientID = PC.PatientID
AND temp.CaseID = PC.CaseID
ORDER BY temp.FulFilHdrID
)
WHERE PC.CompanyID = 'RxCRoads'
+---------------------------------------------------------+----------+
| Query Profile Statistics | |
| Number of INSERT, DELETE and UPDATE statements | 0 |
| Rows affected by INSERT, DELETE, or UPDATE statements | 0 |
| Number of SELECT statements | 2 |
| Rows returned by SELECT statements | 2880384 |
| Number of transactions | 0 |
| Network Statistics | |
| Number of server roundtrips | 3 |
| TDS packets sent from client | 3 |
| TDS packets received from server | 38523 |
| Bytes sent from client | 2128 |
| Bytes received from server | 157781300|
| Time Statistics | |
| Client processing time | 10158 |
| Total execution time | 10158 |
| Wait time on server replies | 0 |
+---------------------------------------------------------+----------+
Code
, Execution Plan
and Statistics
for CTE Version:
WITH CTE (CompanyID, ClientID, ProgramID, PatientID, CaseID, ImportantDate)
AS
(
SELECT PC.CompanyID,
PC.ClientID,
PC.ProgramID,
PC.PatientID,
PC.CaseID,
CASE
WHEN PFH.FulFilHdrCreateDateTime IS NULL
THEN PC.CaseCreateDateTime
ELSE
PFH.FulFilHdrCreateDateTime
END AS [ImportantDate]
FROM PATIENTCASES PC
LEFT OUTER JOIN PATFULFILLMENTHEADER PFH
ON PFH.CompanyID = PC.CompanyID
AND PFH.ClientID = PC.ClientID
AND PFH.ProgramID = PC.ProgramID
AND PFH.PatientID = PC.PatientID
AND PFH.CaseID = PC.CaseID
AND PFH.FulFilHdrID = (SELECT TOP(1) temp.FulFilHdrID
FROM PATFULFILLMENTHEADER temp
WHERE temp.CompanyID = PC.CompanyID
AND temp.ClientID = PC.ClientID
AND temp.ProgramID = PC.ProgramID
AND temp.PatientID = PC.PatientID
AND temp.CaseID = PC.CaseID
ORDER BY temp.FulFilHdrID
)
)
SELECT CompanyID,
ClientID,
ProgramID,
PatientID,
CaseID,
ImportantDate,
DATEDIFF(Day, [ImportantDate], GETDATE())
FROM CTE
WHERE CompanyID = 'RxCRoads'
+---------------------------------------------------------+-----------+
| Query Profile Statistics | |
| Number of INSERT, DELETE and UPDATE statements | 0 |
| Rows affected by INSERT, DELETE, or UPDATE statements | 0 |
| Number of SELECT statements | 2 |
| Rows returned by SELECT statements | 2880383 |
| Number of transactions | 0 |
| Network Statistics | |
| Number of server roundtrips | 3 |
| TDS packets sent from client | 3 |
| TDS packets received from server | 38523 |
| Bytes sent from client | 2348 |
| Bytes received from server | 157781800 |
| Time Statistics | |
| Client processing time | 9985 |
| Total execution time | 9985 |
| Wait time on server replies | 0 |
+---------------------------------------------------------+-----------+
Solution
It looks like you've done some solid analysis here, so I may not be able to add much.
One thing that often comes to my mind when folks start throwing around CTEs is this post from Erik Darling:
CTEs, Inline Views, and What They Do
To sum things up, CTEs are a great base from which you can reference and filter on items in the select list that you otherwise wouldn’t be able to (think windowing functions), but every time you reference a CTE, they get executed. The fewer times you have to hit a larger base set, and the fewer reads you do, the better. If you find yourself referencing CTEs more than once or twice, you should consider a temp or persisted table instead, with the proper indexes.
So in your specific case, since you are not joining to the CTE, or otherwise correlating it with other datasets, it's unlikely that the CTE will cause you any issues.
Just be aware that if someone comes along and starts mucking with the query, joining the CTE to other tables, or back to itself, or adding another level of CTE "nesting" (to crunch more numbers for the final select) - then you will start to get into "operational overhead" territory.