Question

I am trying to pull the differences between two tables, each in a different database. I thought I could just do a full outer join but I am getting back too many results.

The T-SQL below returns 167 results

SELECT N'{'+ADobjectGUID+N'}' [GUID], first_name [First], last_name [Last], description [Full] FROM [db1].[dbo].[tb1] where delete_date is null
except
SELECT ObjectGUID [GUID], GivenName [First], sn [Last], displayName [Full] FROM [db2].[dbo].[tb1]

When I reverse it to be the below, it returns 214 results

SELECT ObjectGUID [GUID], GivenName [First], sn [Last], displayName [Full] FROM [db2].[dbo].[tb1]
except
SELECT N'{'+ADobjectGUID+N'}' [GUID], first_name [First], last_name [Last], description [Full] FROM [db1].[dbo].[tb1] where delete_date is null

However, I need more than just those columns from the databases, so I can't use the except statement.

select N'{'+ADobjectGUID+N'}' [GUID], first_name [First], last_name [Last], db1.dbo.tb1.description [Full], domain_user_id, sAMAccountName, employeeID 
from db1.dbo.tb1
full outer join [db2].[dbo].[tb1]
on ObjectGUID = N'{'+ADobjectGUID+N'}'
    and first_name = GivenName
    and last_name = sn
    and tb1.description = displayName
where db1.dbo.tb1.delete_date is null

That query results in 1784 results. Based on the above, I should be getting 371 results. What am I missing. I am still new to SQL and am sure it is something small.

Was it helpful?

Solution

FULL joins can be tricky. Try this version:

select
    -- columns from db1 
    a_GUID = a.ObjectGUID,
    a_First = a.first_name, 
    a_Last = a.last_name, 
    a_Full = a.description, 
    a.domain_user_id, 
    a.sAMAccountName, 
    a.employeeID,

    -- columns from db2
    b_GUID = N'{' + b.ADobjectGUID + N'}',  
    b_First = b.GivenName, 
    b_Last = b.sn, 
    b_Full = b.displayName, 
from 
    ( select * 
      from db1.dbo.tb1 as a 
      where a.delete_date is null
    ) as a
  full outer join 
    db2.dbo.tb1 as b
  on 
      a.ObjectGUID = N'{' + b.ADobjectGUID + N'}'
  and a.first_name = b.GivenName
  and a.last_name = b.sn
  and a.description = b.displayName
where 
      a.ObjectGUID is null
   or b.ADobjectGUID is null
  ; 

OTHER TIPS

That query results in 1784 results. Based on the above, I should be getting 371 results

No, this is a wrong conclusion, you misunderstand full outer join. It does not produce the "difference" between two tables.

When you do t1 full join t2 you'll get all the rows that are matched by your ON condition plus all the rows that are not matched from the left table + all the rows that are not matched from the right table.

So the result cardinality will be greater or equal (in case of complete match) to each table cardinality.

Maybe you forgot to filter your query? Here is a simple example of what you did:

declare @t1 table (col1 int, col2 int);
insert into @t1 values 
(1, 1), (2,2), (3,3);

declare @t2 table (col1 int, col2 int);
insert into @t2 values 
(1, 1), (2,2), (4,4), (5,5);

select * from @t1
except 
select * from @t2;

select * from @t2
except 
select * from @t1;

select *
from @t1 t1 full join @t2 t2
        on t1.col1 = t2.col1 and t1.col2 = t2.col2

enter image description here

Here in red you see the matched rows.

To filter them out you should consider only rows with NULL join columns on the "other" side like this:

select *
from @t1 t1 full join @t2 t2
        on t1.col1 = t2.col1 and t1.col2 = t2.col2
where (t1.col1 is null and t1.col2 is null) or    
      (t2.col1 is null and t2.col2 is null); 

enter image description here

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top