Move Records Which are older than 3 years from One Database (Production DB) to Another DB (Archive DB) Using SSIS

StackOverflow https://stackoverflow.com/questions/16234782

Question

Requirement is to Move Data Older than 3 years from Production DB to Archive DB , and Once Moved Delete those records from Production DB , so at any point of time Production DB will have only last three years of records.

i want to achieve this by SSIS , i read quite a few articles about Data Archival but couldn't figure out the best Approch.

I am New to SSIS

i want to achieve exactly something like this (answer given in Below link)with extra condition saying move only those records which are older than 3 years and then delete those records.

https://dba.stackexchange.com/questions/25867/moving-data-from-one-db-to-another-using-ssis

Criteria for an accepted answer answer should address

  • scalability
  • complexity
  • failure handling
  • reliability
Was it helpful?

Solution

You can use the OUTPUT clause to delete and return the data to be moved in one go.

create table ProductionTable
(
    ValueDate   datetime        not null
    , Data      varchar(max)    not null
)

insert ProductionTable values ('20100101', '3 years ago')
insert ProductionTable values ('20130425', 'this year')
insert ProductionTable values ('20130426', 'this year')

delete ProductionTable
output deleted.ValueDate, deleted.Data
where ValueDate <= dateadd(year, -3, getdate())

The code can also be accessed on SQLFiddle

Now I will show you the exact steps you need to follow in SSIS to reproduce the example:

  1. Create a new project and define your data sources for ProductionDB and ArchiveDB.
  2. In "Control Flow" tab, create a "Data Flow Task".
  3. In "Data Flow" tab, create a "OLE DB Source" and a "OLE DB Destination".
  4. In "OLE DB Source", select ProductionDB and choose "SQL command" as the data access mode. Paste in the delete statement with the output clause.
  5. Click on "Columns" and then OK.
  6. In "OLE DB Destination", select ArchiveDB and choose "Table or view - fast load" as the data access mode and then choose your ArchiveTable.
  7. Click on "Mappings" and then Ok.
  8. Run the package and you should be able to verify that one row is deleted from ProductionTable and moved to ArchiveTable.

Hope it helps.

Other things to keep in mind, because you are deleting and moving data around, transactional consistency is very important. Imagine half way through your delete/move, the server went down, you then end up with data being deleted but not made it to the archive.

If you are unsure about how to protect your data by enforcing transactional consistency, please seek help from other SQL/SSIS experts on how to use transactions in SSIS.

OTHER TIPS

Create 2 OLE DB Connection Managers. Name them Production and Archive and have them point to the correct servers and database. These CMs are what SSIS uses to push and pull data from the databases.

Add a Data Flow Task. A DFT is the executable that will allow row by row manipulation of the data. Double click on the Data Flow Task. Once inside, add an OLE DB Source and and OLE DB Destination to the canvas. The OLE DB Source is where the data will come from while the OLE DB Destination provides the insert power.

The logic you would want to implement is a Delete first approach, much as I outlined in the other answer.

DELETE
    DF
OUTPUT
    DELETED.*
FROM
    dbo.DeleteFirst AS DF
WHERE
    DF.RecordDate > dateadd(y, 3, current_timestamp);

This query will delete all the rows older than 3 years and push them into the dataflow. In your OLE DB Source, make the following configuration changes

  1. change the Connection Manager from Archive to Production
  2. change the query type from "Table or View" to "Query"
  3. paste your query and click the Columns tab to double check the query parsed

Connect the OLE DB Source to the OLE DB Destination. Double click on the OLE DB Destination and configure it

  1. Verify the Connection Manager is the Archive
  2. Ensure the Access Mode is "Table or View - Fastload" (name approximate)
  3. You might need to check the Retain IDs based on your table design - if you have identity column, then do check it if you want ID 10 from the production system to be ID 10 in the Archive system
  4. Select the actual table
  5. On the Mapping tab, ensure that all the columns mapped. It does this automatically by matching names so there shouldn't be a problem.

If you do not need to span an instance, the above logic can be condensed into a single Execute SQL Task

DELETE
    DF
OUTPUT 
    DELETED.*
    INTO
    ArchiveDatabase.dbo.DeleteFirst
FROM
    dbo.DeleteFirst AS DF
WHERE
    DF.RecordDate > dateadd(y, 3, current_timestamp);

Also note with this approach that if you have identity columns you will need to provide an explicit column list and turn on and off the IDENTITY_INSERT property.

Have you considered table partitioning instead? You can move the old records to a totally different disk and still leave them available in the same table. It can also help with performance in some cases... all without an SSIS package.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top