Question

I'm trying to find a good way to check if data I'm given through a bulk load (SQLLoader) already exists in my data set so I don't load it again.

Currently we have a set up like this,

TableA
col1, col2, bulkLoadName

This table would contain data like,

col1, col2, bulkLoadName
"Joe", 35, "Load1"
"Tim", 65, "Load1"
"Ray", 95, "Load1"
"Joe", 35, "Load2"

And I'd like to change it to,

TableA
PK, col1, col2

TableAtoBulkLoadName
PK, TABLEA_PK, BulkLoadName_PK

BulkLoadName
PK, bulkLoadName

Where the data would look like,

PK, col1, col2
1, "Joe", 35
2, "Tim", 65
3, "Ray", 95

PK, TABLEA_PK, BulkLoadName_PK
1, 1, 1
2, 2, 1
3, 3, 1
4, 1, 2

PK, bulkLoadName
1, "Load1"
2, "Load2"

This normalizes the data so I can easily check for a specific load without a string search and MOST importantly prevents me from loading duplicate data in the database just because something is defined again in a later load.

I'm having trouble deciding how I should implement the duplicate checks. I'm not well verse with SQL and need a solution that is in ORACLE 11g. I've looked and I've come up with 2 possible solutions...

Solution 1:

Use a temp table to store the bulk load and run a stored procedure once loaded to check.

Solution 2:

Use a MERGE clause on the TableA that adds new records to TableA or creates a new intersection record in TableAtoBulkLoadName if the record already exists.

My questions now that all of the background info is out there is what are the pro's and con's with these approaches? Is this kind of normalization normal? Are there standard ways of doing this sort of thing?

Thanks!

Was it helpful?

Solution

Strictly from a performance stand point, if you can do everything in one statement, that's usually better.

But as soon as you start to transform the data in various ways, I personally find that by using a staging table, the resulting code is a lot easier to read and modify.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top