Question

The following two SQL queries return the same results:

    SELECT * FROM Table1
    WHERE Table1.Value1 = 'something' OR Table1.Value2 IN (SELECT Value2 FROM Table2)

    SELECT * FROM Table1
    LEFT JOIN Table2 
    ON Table1.Value2 = Table2.Value2
    WHERE (Table1.Value1 = 'something' OR Table2.Value2 IS NOT NULL)

Similarly, these two queries return the same results:

    SELECT * FROM Table1
    WHERE Table1.Value1 = 'something' AND Table1.Value2 NOT IN (SELECT Value2 FROM Table2)

    SELECT * FROM Table1
    LEFT JOIN Table2
    ON Table1.Value2 = Table2.Value2
    WHERE Table1.Value1 = 'something' AND Table2.Value2 IS NULL

Personally, I find it easier to read the options that use "IN" or "NOT IN" (especially since my real query already has a pile of joins in it). However, if the number of values in Table2 grows large (currently it only returns three results), will this query become inefficient? Or will the query optimizer figure it out and turn it into a join behind the scenes? I'm using SQL Server 2012.

Was it helpful?

Solution

The first would be better as:

SELECT <cols> 
  FROM dbo.Table1
  WHERE Value1 = 'something' 
  OR EXISTS (SELECT 1 FROM dbo.Table2 WHERE Value2 = Table1.Value2);

Though your performance problem - assuming Value2 is indexed in both locations and you are really only going to select the columns you need instead of forcing a scan or a lookup using * - is going to be the OR. You might consider this alternative, if Value1 is properly indexed, at least to test the difference in performance (you'll want to look at the plans, not just measure time, while you have just three rows):

SELECT <cols>
  FROM dbo.Table1 
  WHERE Value1 = 'something'
UNION ALL
SELECT <cols>
  FROM dbo.Table1
  WHERE Value1 <> 'something'
  AND EXISTS (SELECT 1 FROM dbo.Table2 WHERE Value2 = Table1.Value2);

For the NOT IN query, this will be both more reliable and at least as efficient as the two options you offered:

SELECT <cols>
  FROM dbo.Table1
  WHERE Value1 = 'something' 
  AND NOT EXISTS (SELECT 1 FROM dbo.Table2 WHERE Value2 = Table1.Value2);

Indexing is going to be key here, but it is important to understand NOT IN and LEFT OUTER JOIN can throw you in a hole. See the following article:

http://www.sqlperformance.com/2012/12/t-sql-queries/left-anti-semi-join

OTHER TIPS

  • These 2 queries are not equivalent: When used IN (or NOT IN), for each row in Table1 you will get 0 or 1 result-row. When used join, each row may appears 0, 1 or many times. So, "two SQL queries return the same results" - just because of specific data. Or Table2 has unique index/PK on Value2

  • Using UNION as

SELECT ... WHERE Table1.Value1 = 'something'
UNION (ALL)
SELECT ... WHERE Table1.Value2 = Table2.Value2

may also give a different result, because UNION will remove duplicates (which may be valuable), and UNION ALL may double some result-rows if they match to both criterias

  • If you will use EXISTS() instead of IN() in first query... most likely you will get identical execution plan, because sql optimizer will recognize that these operations are equals and will choose same optimal way.

  • Even thou you may use subquery in your statement, sql optimizer may rebuild plan such way that will not use subquery. In other words, two equal queries, written differently, most likely will be optimized to the same plan.

  • For very complex queries that may not work, because sql may not have enough time to complete optimization in full, and stops where it stop. In this case such different-but-similar queries may have different result. You need try and test.

Plan and Performance will depends of data, type of parameters (constant, variable, calculated values), statistics, indexes.... And for some combinations of these criteria Query-1 will more optimal then Query-2, and vise-versa for others.

To get right answer you need analyze and compare execution plans

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top