Common Table Expression (CTE) benefits?

https://dba.stackexchange.com/questions/14490

16-10-2019
|

Question

From msdn :

Unlike a derived table, a CTE can be self-referencing and can be referenced multiple times in the same query.

I'm using CTEs quite a lot, but I've never thought deeply about the benefits of using them.

If I reference a CTE multiple times in the same query:

Is there any performance benefit?
If I'm doing a self join, will SQL Server scan the target tables twice?

Solution

As a rule, a CTE will NEVER improve performance.

A CTE is essentially a disposable view. There are no additional statistics stored, no indexes, etc. It functions as a shorthand for a subquery.

In my opinion they can be EASILY overused (I see a lot of overuse in code in my job). Some good answers are here, but if you need to refer to something more than once, or it's more than a few hundred thousand rows, put it into a #temp table instead and index it.

OTHER TIPS

One place besides recursion where I find CTEs incredibly useful is when creating complex reporting queries. I use a series of CTEs to get chunks of the data I need and then combine in the final select. I find they are easier to maintain than doing the same thing with a lot of derived tables or 20 joins and I find that I can be more assured that it returns the correct data with no effect of multiple records due to the one-many relationships in all the different joins. Let me give a quick example:

;WITH Conferences (Conference_id)
AS 
(select  m.Conference_id
FROM mydb.dbo.Conference m 
WHERE client_id = 10
    and Conference_id in 
            (select Conference_id from mydb.dbo.Expense 
            where amount <>0
            and amount is not null)
     )
--select * from Conferences
,MealEaters(NumberMealEaters, Conference_id, AttendeeType)
AS
(Select count(*) as NumberMealEaters, m.Conference_id,  AttendeeType 
from mydb.dbo.attendance ma 
join Conferences m on m.Conference_id = ma.Conference_id
where (ma.meals_consumed>0 or meals_consumed is null)and attended = 1
group by m.Conference_id)
--select * from MealEaters

,Expenses (Conference_id,expense_date, expenseDescription,  RecordIdentifier,amount)
AS
(select Conference_id,max(expense_date) as Expense_date, expenseDescription,  RecordIdentifier,sum(amount) as amount
    FROM
        (SELECT Conference_id,expense_date,  amount, RecordIdentifier
        FROM mydb.dbo.Expense
        WHERE  amount <> 0 
            and Conference_id IN 
            (SELECT  Conference_id
            FROM mydb.dbo.Conferences ) 
        group by Conference_id, RecordIdentifier) a
)
--select * from Expenses
Select m.Conference_id,me.NumberMealEaters, me.AttendeeType, e.expense_date,         e.RecordIdentifier,amount
from Conferences m
join mealeaters me on m.Conference_id = me.Conference_id
join expenses e on e.Conference_id = m.Conference_id

So by separating out the different chunks of information you want, you can check each part individually (using the commented out selects, by uncommenting each one individually and only running as far as that select) and if you needed to make a change to the expense calculation (in this example), it is easier to find than when they are all mixed together into one massive query. Of course the actual reporting queries I use this for are generally much more complicated than the example.

As always it depends but there are cases where the performance is greatly improved. I see it with INSERT INTO SELECT statements where you use a CTE for the select and then use that in the INSERT INTO. It may have to do with RCSI being set on for the database but for those times when very little is selected it can help quite a bit.

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange