Question

Am doing a proof of concept for usage of SSIS CDC component.

For initial load: What I got so far is

step 1: CDC Control Task with CDC Control Operation as : Mark Initial load start step 2: Data flow to load all source records to destination step 3: CDC Control Task with CDC Control Operation as : Mark Initial load end (rest of setting as of CDC in Step 1)

When I run first time ... all data loads fine. If run the same ... instead picking none records (no change in source) ... its reloads all the records again ... duplicating.

Does this doesn't check in CDC State table created in step 1.

Anyone can point me to good sample or tutorial would be great.

Was it helpful?

Solution

Yes I got it .. what I was doing wrong was I created a single package to do all initial load and incremental load. And other thing was I didn't created a step for 'Mark CDC Start' after initial load. Now I got one package which does full initial loads and then Marks CDC start. Then from there onwards my second package to do incremental load starts.

OTHER TIPS

First of all, you need to understand that CDC Control Task only deals with LSNs. It reads it from CDC enabled database, and save it to a specified variable so that subsequent step can use it. Optional it can persist the value into a database table, then subsequent package execution can use it.

The step 1 and step 3 essentially just place one LSB number in the CDC state respectively, the max LSB at when these steps are executed. While the initial load is performing, there might be other activities happening at the same time, these 2 steps give us the LSB range which covers the initial loading period.

The next run following initial load should be an incremental load which employs the CDC source in the data flow to retrieve transactions happened during the initial load.

A great post here from Matt Masson should give you the insight of CDC.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top