Question

I'm going to design an table architecture. Here I wanted to compare same data coming from different sources say Source_A and Source_B. I have to compare few attributes and identify

  1. Mismatches
  2. Data that are missing in source_a
  3. Data that are missing in Source_B.

Finally i have report to the same in PowerBI with charts. For now I have 2 tables A_DATA and B_DATA and both are having below structure (this is just a sample, I have lot more columns)

+---------------+
| Columns       |
+---------------+
| Material_ID   |
+---------------+
| Material_Name |
+---------------+
| Material_Type |
+---------------+
| Quantity      |
+---------------+

Now I'm confused whether should I create separate table for 3 cases(Mismatch,Source_A missing,Source_B Missing) or In single table I should have one more column saying Status and keep everything there. For reporting in PowerBI (like out of 1K rows, 5K are mismatches). Please suggest. Im really confused.

Was it helpful?

Solution

This is common task which may be solved using FULL OUTER JOIN by any NOT NULL expression which is unique over each separate table and is present in both tables. Mismatches are detected by column compare, missing rows - by NULL value for joining expression in according table.

SELECT COALESCE(t1.id, t2.id) id, 
       CASE WHEN t1.id IS NULL 
                 THEN 'Absent in TableA' 
            WHEN t2.id IS NULL 
                 THEN 'Absent in TableB'
            WHEN t1.columnX != t2.ColumnX 
                 THEN 'Differs in ColumnX at least' -- may differ in another columns too
            WHEN t1.columnY != t2.ColumnY 
                 THEN 'Differs in ColumnY at least' -- but identical by columnX due to previous condition
            -- and so on
            ELSE 'Are identical'
            END AS MismatchType
FROM tableA t1
FULL OUTER JOIN tableB t2 ON t1.id = t2.id

If you need detailed difference diagnosis then you must build complex conditions included all columns in interest, like

WHEN t1.columnX != t2.ColumnX AND t1.columnY != t2.ColumnY AND t1.columnZ = t2.ColumnZ 
    THEN 'Differs in ColumnX and ColumnY'
Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top