Is an aggregated join more efficient than several sub selects to set flags in a query?

https://stackoverflow.com/questions/22134172

19-10-2022
|

Question

I am querying two tables, one is the parent and the other is a child. The relationship from parent to child is 1..*.

In my query I have three select 1 style subselects from the child and the result is cast as a bit, like this:

SELECT Id
       ,...lots more columns...
       ,cast(coalesce(SELECT 1 FROM child WHERE c.ParentId = p.Id AND c.Field1 IS NULL), 0) as bit
       ,cast(coalesce(SELECT 1 FROM child WHERE c.ParentId = p.Id AND c.Field2 IS NULL), 0) as bit
       ,cast(coalesce(SELECT 1 WHERE EXISTS(SELECT * FROM Child c where c.ParentId = p.Id and [test several fields for NULL] )), 0) as bit
FROM Parent p
WHERE ...etc...

The objective is to select rows from the parent but have flags indicating whether a parent has any children with specific fields set to null.

Given that there is a normal foreign key constraint between the child and parent, there will be anything from high tens of thousands to low millions of parent records and between 10% and 25% of the parent rows will have children, is this an efficient SELECT to use?

I have avoided using an explicit join with grouping due to the number of columns being selected. Would a join with aggregate be more efficient in this case? Would several joins to the child (one join for each flag) be more efficient? Or would a CTE be more efficient?

If the most efficient method cannot be determined from what I've specified above, what should I look for to determine the most efficient method?

(note the minimum version of SQL Server that this will be run on is 2008 R2)

La solution

First, if you are going to use a subquery with a NULL check, then use ISNULL(). Note that I much, much, much prefer the ANSI standard coalesce(), but SQL Server has done us the favor of documenting why ISNULL() is better (effort that IMHO could have been spent fixing the problem):

For example, when the code COALESCE((subquery), 1) is executed, the subquery is evaluated twice.

Second, I don't think your sample code will work. You need an additional set of parentheses around the subquery.

And third, onto your question. With the right indexes, it is quite possible that the subquery method will perform better. In fact, the aggregation method has a weakness: the aggregate values are computed before the where clause. This can mean that all the aggregate values are computed when very, very few are needed.

So, your construct is fine. If looking up a value took a lot of time, then you could also write this using APPLY. That probably isn't necessary, though.

Licencié sous: CC-BY-SA avec attribution

Non affilié à StackOverflow