Does the EXIST clause only return distinct rows by its nature?

https://dba.stackexchange.com/questions/138286

01-10-2020
|

Question

I have two queries that appear to be logically equivalent, yet they return different result sets. I am using the AdventureWorks2012 database.

The first query uses an EXISTS clause with a subquery:

SELECT p.FirstName, p.LastName, e.JobTitle
FROM Person.Person AS p 
JOIN HumanResources.Employee AS e
   ON e.BusinessEntityID = p.BusinessEntityID 
WHERE EXISTS
(SELECT *
    FROM HumanResources.Department AS d
    JOIN HumanResources.EmployeeDepartmentHistory AS edh
       ON d.DepartmentID = edh.DepartmentID
    WHERE e.BusinessEntityID = edh.BusinessEntityID
    AND d.Name LIKE 'P%')

The second query only uses JOINs:

SELECT p.FirstName, p.LastName, e.JobTitle
FROM Person.Person AS p 
JOIN HumanResources.Employee AS e
   ON e.BusinessEntityID = p.BusinessEntityID 
JOIN HumanResources.employeeDepartmentHistory AS edh
    ON P.businessentityID = edh.BusinessEntityID
JOIN HumanResources.Department AS d
    ON edh.DepartmentID = d.DepartmentID
WHERE d.name LIKE 'P%'
AND E.BusinessEntityID = EDH.BusinessEntityID

The first query returns 198 rows, while the second query returns 199.

I then ran both queries against each other with an EXCEPT clause to see which row was disappearing:

SELECT  p.FirstName, p.LastName, e.JobTitle
FROM Person.Person AS p 
JOIN HumanResources.Employee AS e
   ON e.BusinessEntityID = p.BusinessEntityID 
JOIN HumanResources.employeeDepartmentHistory AS edh
    ON P.businessentityID = edh.BusinessEntityID
JOIN HumanResources.Department AS d
    ON edh.DepartmentID = d.DepartmentID
WHERE d.name LIKE 'P%'
AND E.BusinessEntityID = EDH.BusinessEntityID

EXCEPT

SELECT p.FirstName, p.LastName, e.JobTitle
FROM Person.Person AS p 
JOIN HumanResources.Employee AS e
   ON e.BusinessEntityID = p.BusinessEntityID 
WHERE EXISTS
(SELECT *
    FROM HumanResources.Department AS d
    JOIN HumanResources.EmployeeDepartmentHistory AS edh
       ON d.DepartmentID = edh.DepartmentID
    WHERE e.BusinessEntityID = edh.BusinessEntityID
    AND d.Name LIKE 'P%')

Surprisingly no rows were returned. But then I added a DISTINCT clause to my query with JOINs, and then both queries were finally returning the same result sets.

Am I correct in assuming that to have logically equivalent queries of this nature, I would have to add a distinct clause to my query that only uses JOINs?

What is the benefit of using subqueries like this? To me, JOINs make much more sense, and are easier to understand as to what is going on.

Solution

EXISTS don't actually return any rows. They check for the existence and then move on.

Your problem is probably a duplicate caused by the JOINs to the second two tables. Try running this:

SELECT p.FirstName, p.LastName, e.JobTitle, COUNT(1) AS Cnt
FROM Person.Person AS p 
JOIN HumanResources.Employee AS e
   ON e.BusinessEntityID = p.BusinessEntityID 
JOIN HumanResources.employeeDepartmentHistory AS edh
    ON P.businessentityID = edh.BusinessEntityID
JOIN HumanResources.Department AS d
    ON edh.DepartmentID = d.DepartmentID
WHERE d.name LIKE 'P%'
AND E.BusinessEntityID = EDH.BusinessEntityID
GROUP BY p.FirstName, p.LastName, e.JobTitle
HAVING COUNT(1) > 1

I'm guessing you will get back one row with a count of 2.

The reason you aren't seeing anything in the EXCEPT is because all of the rows do exist in both places. You just have a duplicate in the second query.

The reason for the duplicate is because either the joins with HumanResources.employeeDepartmentHistory and HumanResources.Department. Basically the HumanResources.employeeDepartmentHistory table is used to map one employee to two departments.

This is one of the major differences in using a JOIN vs EXISTS for the type of logic you are trying to use.

OTHER TIPS

When doing your last test trying to pop out the errant row with an except, you need to do a select of all columns (*) not just the few you want. This will almost certainly reveal something.

Based on the table name it will likely be that an employee was in one department and then moved to another department, and so with the joins query you were accidentally pulling back an additional department row. As you don't show the department column in your query you didn't notice.

That's why a select * with the except is more likely to show it. You might otherwise find a duplicate row with a different key - bane of DBAs everywhere.

If you think about it this way you'll see it's not about exists and distinct data. It's about selecting an additional row and not noticing.

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange