Database Design For reporting comparison results
-
04-03-2021 - |
Question
I'm going to design an table architecture. Here I wanted to compare same data coming from different sources say Source_A
and Source_B
. I have to compare few attributes and identify
- Mismatches
- Data that are missing in source_a
- Data that are missing in Source_B.
Finally i have report to the same in PowerBI with charts. For now I have 2 tables A_DATA
and B_DATA
and both are having below structure (this is just a sample, I have lot more columns)
+---------------+
| Columns |
+---------------+
| Material_ID |
+---------------+
| Material_Name |
+---------------+
| Material_Type |
+---------------+
| Quantity |
+---------------+
Now I'm confused whether should I create separate table for 3 cases(Mismatch,Source_A missing,Source_B Missing) or In single table I should have one more column saying Status
and keep everything there. For reporting in PowerBI (like out of 1K rows, 5K are mismatches). Please suggest. Im really confused.
Solution
This is common task which may be solved using FULL OUTER JOIN by any NOT NULL expression which is unique over each separate table and is present in both tables. Mismatches are detected by column compare, missing rows - by NULL value for joining expression in according table.
SELECT COALESCE(t1.id, t2.id) id,
CASE WHEN t1.id IS NULL
THEN 'Absent in TableA'
WHEN t2.id IS NULL
THEN 'Absent in TableB'
WHEN t1.columnX != t2.ColumnX
THEN 'Differs in ColumnX at least' -- may differ in another columns too
WHEN t1.columnY != t2.ColumnY
THEN 'Differs in ColumnY at least' -- but identical by columnX due to previous condition
-- and so on
ELSE 'Are identical'
END AS MismatchType
FROM tableA t1
FULL OUTER JOIN tableB t2 ON t1.id = t2.id
If you need detailed difference diagnosis then you must build complex conditions included all columns in interest, like
WHEN t1.columnX != t2.ColumnX AND t1.columnY != t2.ColumnY AND t1.columnZ = t2.ColumnZ
THEN 'Differs in ColumnX and ColumnY'