Question

I have a relatively simple query joining two tables. The "Where" criteria can be expressed either in the join criteria or as a where clause. I'm wondering which is more efficient.

Query is to find max sales for a salesman from the beginning of time until they were promoted.

Case 1

select salesman.salesmanid, max(sales.quantity)
from salesman
inner join sales  on salesman.salesmanid =sales.salesmanid 
                  and sales.salesdate < salesman.promotiondate
group by salesman.salesmanid 

Case 2

select salesman.salesmanid, max(sales.quantity)
from salesman
inner join sales  on salesman.salesmanid =sales.salesmanid 
where sales.salesdate < salesman.promotiondate
group by salesman.salesmanid 

Note Case 1 lacks a where clause altogether

RDBMS is Sql Server 2005

EDIT If the second piece of the join criteria or the where clause was sales.salesdate < some fixed date so its not actually any criteria of joining the two tables does that change the answer.

Was it helpful?

Solution

I wouldn't use performance as the deciding factor here - and quite honestly, I don't think there's any measurable performance difference between those two cases, really.

I would always use case #2 - why? Because in my opinion, you should only put the actual criteria that establish the JOIN between the two tables into the JOIN clause - everything else belongs in the WHERE clause.

Just a matter of keeping things clean and put things where they belong, IMO.

Obviously, there are cases with LEFT OUTER JOINs where the placement of the criteria does make a difference in terms of what results get returned - those cases would be excluded from my recommendation, of course.

Marc

OTHER TIPS

You can run the execution plan estimator and sql profiler to see how they stack up against each other.

However, they are semantically the same underneath the hood according to this SQL Server MVP:

http://www.eggheadcafe.com/conversation.aspx?messageid=29145383&threadid=29145379

I prefer to have any hard coded criteria in the join. It makes the SQL much more readable and portable.

Readability: You can see exactly what data you're going to get because all the table criteria is written right there in the join. In large statements, the criteria may be buried within 50 other expressions and is easily missed.

Portability: You can just copy a chunk out of the FROM clause and paste it somewhere else. That gives the joins and any criteria you need to go with it. If you always use that criteria when joining those two tables, then putting it in the join is the most logical.

For Example:

FROM
table1 t1
JOIN table2 t2_ABC ON
  t1.c1 = t2_ABC.c1 AND
  t2_ABC.c2 = 'ABC'

If you need to get a second column out of table 2 you just copy that block into Notepad, search/repalce "ABC" and presto and entire new block of code ready to paste back in.

Additional: It's also easier to change between an inner and outer join without having to worry about any criteria that may be floating around in the WHERE clause.

I reserve the WHERE clause strictly for run-time criteria where possible.

As for efficiency: If you're referring to excecution speed, then as everyone else has stated, it's redundant. If you're referring to easier debugging and reuse, then I prefer option 1.

One thing I want to say finally as I notified, before that.. Both ways may give the same performance or using the criteria at Where clause may be little faster as found in some answers..

But I identified one difference, you can use for your logical needs..

  1. Using the criteria at ON clause will not filter/skip the rows to select instead the join columns would be null based on the conditions

  2. Using the criteria at Where clause may filter/skip the rows at the entire results

I don't think you'll find a finite answer for this one that applies to all cases. The 2 are not always interchangeable - since for some queries (some left joins) you will come up with different results by placing the criteria in the WHERE vs the FROM line.

In your case, you should evaluate both of these queries. In SSMS, you can view the estimated and actual execution plans of both of these queries - that would be a good first step in determining which is more optimal. You could also view the time & IO for each (set statistics time on, set statistics io on) - and that will also give you information to make your decision.

In the case of the queries in your question - I'd bet that they'll both come out with the same query plan - so in this case it may not matter, but in others it could potentially produce different plans.

Try this to see the difference between the 2...

SET STATISTICS IO ON
SET STATISTICS TIME ON

select salesman.salesmanid, 
       max(sales.quantity)
from   salesmaninner join sales on salesman.salesmanid =sales.salesmanid
       and sales.salesdate < salesman.promotiondate
group by salesman.salesmanid

select salesman.salesmanid, 
       max(sales.quantity)
from   salesmaninner join sales on salesman.salesmanid = sales.salesmanid 
where  sales.salesdate < salesman.promotiondate
group by salesman.salesmanid

SET STATISTICS TIME OFF
SET STATISTICS IO OFF

It may seem flippant, but the answer is whichever query for which the query analyzer produces the most efficient plan.

To my mind, they seem to be equivalent, so the query analyzer may well produce identical plans, but you'd have to test.

Neither is more efficient, using the WHERE method is considered the old way to do so (http://msdn.microsoft.com/en-us/library/ms190014.aspx). YOu can look at the execution plan and see they do the same thing.

Become familiar with the Estimated Execution Plan in SQL Management Studio!! Like others have said, you're at the mercy of the analyzer no matter what you do so trust its estimates. I would guess the two you provided would produce the exact same plan.

If it's an attempt to change a development culture, pick the one that gives you a better plan; for the ones that are identical, follow the culture

I've commented this on other "efficiency" posts like this one (it's both sincere and sarcastic) -- if this is where your bottlenecks reside, then high-five to you and your team.

Case 1 (criteria in the JOIN) is better for encapsulation, and increased encapsulation is usually a good thing: decreased copy/paste omissions to another query, decreased bugs if later converted to LEFT JOIN, and increased readability (related stuff together and less "noise" in WHERE clause). In this case, the WHERE clause only captures principal table criteria or criteria that spans multiple tables.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top