Difference between two tables
-
09-10-2020 - |
Question
I am trying to pull the differences between two tables, each in a different database. I thought I could just do a full outer join but I am getting back too many results.
The T-SQL below returns 167 results
SELECT N'{'+ADobjectGUID+N'}' [GUID], first_name [First], last_name [Last], description [Full] FROM [db1].[dbo].[tb1] where delete_date is null
except
SELECT ObjectGUID [GUID], GivenName [First], sn [Last], displayName [Full] FROM [db2].[dbo].[tb1]
When I reverse it to be the below, it returns 214 results
SELECT ObjectGUID [GUID], GivenName [First], sn [Last], displayName [Full] FROM [db2].[dbo].[tb1]
except
SELECT N'{'+ADobjectGUID+N'}' [GUID], first_name [First], last_name [Last], description [Full] FROM [db1].[dbo].[tb1] where delete_date is null
However, I need more than just those columns from the databases, so I can't use the except
statement.
select N'{'+ADobjectGUID+N'}' [GUID], first_name [First], last_name [Last], db1.dbo.tb1.description [Full], domain_user_id, sAMAccountName, employeeID
from db1.dbo.tb1
full outer join [db2].[dbo].[tb1]
on ObjectGUID = N'{'+ADobjectGUID+N'}'
and first_name = GivenName
and last_name = sn
and tb1.description = displayName
where db1.dbo.tb1.delete_date is null
That query results in 1784 results. Based on the above, I should be getting 371 results. What am I missing. I am still new to SQL and am sure it is something small.
Solution
FULL
joins can be tricky. Try this version:
select
-- columns from db1
a_GUID = a.ObjectGUID,
a_First = a.first_name,
a_Last = a.last_name,
a_Full = a.description,
a.domain_user_id,
a.sAMAccountName,
a.employeeID,
-- columns from db2
b_GUID = N'{' + b.ADobjectGUID + N'}',
b_First = b.GivenName,
b_Last = b.sn,
b_Full = b.displayName,
from
( select *
from db1.dbo.tb1 as a
where a.delete_date is null
) as a
full outer join
db2.dbo.tb1 as b
on
a.ObjectGUID = N'{' + b.ADobjectGUID + N'}'
and a.first_name = b.GivenName
and a.last_name = b.sn
and a.description = b.displayName
where
a.ObjectGUID is null
or b.ADobjectGUID is null
;
OTHER TIPS
That query results in 1784 results. Based on the above, I should be getting 371 results
No, this is a wrong conclusion, you misunderstand full outer join
. It does not produce the "difference" between two tables.
When you do t1 full join t2
you'll get all the rows that are matched by your ON
condition plus all the rows that are not matched from the left table + all the rows that are not matched from the right table.
So the result cardinality will be greater or equal (in case of complete match) to each table cardinality.
Maybe you forgot to filter your query? Here is a simple example of what you did:
declare @t1 table (col1 int, col2 int);
insert into @t1 values
(1, 1), (2,2), (3,3);
declare @t2 table (col1 int, col2 int);
insert into @t2 values
(1, 1), (2,2), (4,4), (5,5);
select * from @t1
except
select * from @t2;
select * from @t2
except
select * from @t1;
select *
from @t1 t1 full join @t2 t2
on t1.col1 = t2.col1 and t1.col2 = t2.col2
Here in red you see the matched rows.
To filter them out you should consider only rows with NULL
join columns on the "other" side like this:
select *
from @t1 t1 full join @t2 t2
on t1.col1 = t2.col1 and t1.col2 = t2.col2
where (t1.col1 is null and t1.col2 is null) or
(t2.col1 is null and t2.col2 is null);