Question

We're designing data import from an external source like MAS200 into our production SQL Server 2005/2008 database. The source is going to be a transactional database and secure/isolated. We need to keep our database in sync with the source so a periodic data sync is expected.

We're given the liberty to ask for any type of source data - like in form of CSV, txt files, or even have the source data in another SQL database which has a similar structure. We need to choose the best way to import data - it'll be periodic and might be done on hourly or daily basis.

Based on my experience, I believe that having the source data in SQL database might be the best way to get started. Here's a rough design of what we've derived so far -

  1. Periodically source db will be populated externally (not our part)
  2. Preprocessing: Polish the source table data (i.e. trim, lookup) - general data formatting & transformation
  3. Fetch: Create a CURSOR to loop through records. We're planning to update existing data and insert new so we'll need at least two CURSOR loops one by one.
  4. Populate: Within the CURSOR loop records will be updated / inserted
  5. Postprocessing: Again some final touches & lookup mapping (i.e. replace code with id)
  6. Check: Finally, run a consistency check on tables to ensure the integrity of data imported

To summarize, we're going to break the steps in stored procedures and then create an SQL Job that will follow those steps one by one. I know there are many ways to do this, SSIS, Data Import wizard, etc.. but we need to keep it quite simple, easily portable, less dependent and flexible for future changes.

NOTE: The data will be huge - the last time we had a similar setup it took several minutes like almost 20-25 mins to complete the whole data import process so we scheduled an hourly data import.

Thank you.


UPDATE #1: I understand that using the MERGE command seems best. But if I've to create it for SQL Server 2005? I believe it works with 2008 onwards. I found a link -

http://sqlserver-tips.blogspot.com/2006/09/mimicking-merge-statement-in-sql.html

Any other ideas for 2005?

Was it helpful?

Solution

For Items 3 and 4 - if you are using SQL Server 2008 consider using the MERGE command rather than cursors and loops

If possible, retain an untouched copy of the source db, then if there are any issues in data processing you can more easily track down the cause.

OTHER TIPS

Have you considered using SQL Server Integration Services? It sounds like your project is ideally suited to it.

I recently worked on a project that pulled data from various datasources (both databases and files), aggregated it and cleaned it and then pushed it into a relational SQL Server 2008 database. This was all pretty straight-forward in SSIS.

As others have said, there's no need to use a cursor in this process; I also agree that SSIS is probably a better fit for this than you think (because it IS portable and configurable). However, if you want to do this in T-SQL, then I would advise that you replace your FETCH step with something like:

  1. Pull data from your staging table that you used to polish your data.
  2. If you can't use the MERGE command in SQL 2008, you can emulate the same thing with a JOINs:

    --rows to be updated SELECT * FROM staging JOIN destination ON staging.ID = destination.ID

    --rows to be inserted SELECT * FROM staging JOIN destination ON staging.ID = destination.ID WHERE destination.ID IS NULL

Easy peasy, no CURSORS.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top