Question

MSSQL is doing something I don't understand, and I hope to find an answer here.

I have a small query that uses 2 sub-queries in the where clause:

where TerminatedDateTime between @startdate and @enddate
and Workgroup in (select distinct Workgroup from #grouping)
and Skills in (select Skills from #grouping) 

The query runs fine, but when I look at the execution plan is see the following: https://i.stack.imgur.com/ogkRP.png

The query select distinct Workgroup from #grouping has one result: "workgroup1"

The result of the query has 541 rows, but it still fetches all the rows within the date selection. if I remove the workgroup and skill part, the amount of rows is the same. The filtering is done in the hash match. If I enter the name where the select query is, I see the following:

where TerminatedDateTime between @startdate and @enddate
and Workgroup in ('workgroup1')
and Skills in (select Skills from #grouping) 

https://i.stack.imgur.com/Ydq6C.png

Here it selects the correct number of rows and the query runs much better. Why is this, and is there a way to run the query with the sub-query and make it select only the relevant rows from the view? I have tried it with an inner join on the #grouping table, but with the same results, it selects to much rows.

Was it helpful?

Solution

I'm not sure why you need distinct in (select distinct Workgroup from #grouping).

The problem here is that the estimates are off. Without seeing the whole query and the execution plan XML, I'd suggest to try these alternatives:

  1. select workgroup and skills into a #temp table and join to it

  2. add option(recompile) to the statement

Each one should be a solution by itself.

It would be benefitial to see the execution plan XML anyway.

EDIT (after reviewing the execution plan, thx for making it available):

This query is over a partitioned view. With check constraints in place, we can see that the partition elimination was done properly, according to runtime value of @startdate and @enddate parameters.

Why optimizer produced different execution plans for the first (one with the subquery) and the second (one with scalar) query?

As far as the optimizer is concerned, it's just a coincidence that the subquery produced only one row. It has to create an execution plan which will be valid for any output from the subquery, be it no rows, one or many.

OTOH, when you specify a scalar value, then optimizer is free to make more straight-forward decisions.

Working with a partitioned view made optimizer's job more difficult, hence my original recommedations showed useless.

Yes, optimizer could probably do a better job here. BTW, are workgroups and skills correlated in any way?

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top