I'm trying to find a good way to check if data I'm given through a bulk load (SQLLoader) already exists in my data set so I don't load it again.
Currently we have a set up like this,
TableA
col1, col2, bulkLoadName
This table would contain data like,
col1, col2, bulkLoadName
"Joe", 35, "Load1"
"Tim", 65, "Load1"
"Ray", 95, "Load1"
"Joe", 35, "Load2"
And I'd like to change it to,
TableA
PK, col1, col2
TableAtoBulkLoadName
PK, TABLEA_PK, BulkLoadName_PK
BulkLoadName
PK, bulkLoadName
Where the data would look like,
PK, col1, col2
1, "Joe", 35
2, "Tim", 65
3, "Ray", 95
PK, TABLEA_PK, BulkLoadName_PK
1, 1, 1
2, 2, 1
3, 3, 1
4, 1, 2
PK, bulkLoadName
1, "Load1"
2, "Load2"
This normalizes the data so I can easily check for a specific load without a string search and MOST importantly prevents me from loading duplicate data in the database just because something is defined again in a later load.
I'm having trouble deciding how I should implement the duplicate checks. I'm not well verse with SQL and need a solution that is in ORACLE 11g. I've looked and I've come up with 2 possible solutions...
Solution 1:
Use a temp table to store the bulk load and run a stored procedure once loaded to check.
Solution 2:
Use a MERGE
clause on the TableA
that adds new records to TableA
or creates a new intersection record in TableAtoBulkLoadName
if the record already exists.
My questions now that all of the background info is out there is what are the pro's and con's with these approaches? Is this kind of normalization normal? Are there standard ways of doing this sort of thing?
Thanks!