Domanda

I need to transfer about 11 million rows daily from one database to another. The source table is about a half a billion rows total at this point.

I was using the "get everything since ?" method, using the max in the destination as the ?, but the maintenance of the source is kind of funky. They keep going back to fill holes and my method isn't working.

The standard Lookup transform takes hours to run. Pragmatic's TaskFactory has an Upsert component, but it's not in this project's budget.

Is there a better way than Lookup to lookup?

È stato utile?

Soluzione

Here are some options:

A. Reduce the input data by implementing some kind of CDC (at the volumes and data variability you're talking you should really consider this). What options do you have for CDC at the source (i.e. can you create triggers and logging tables? Do you have a version of SQL Server that supports native CDC?)

B. Load the input data into a staging table and use INSERT/UPDATE or MERGE to apply it to your target table

C. Load the input data into a staging table and DELETE/INSERT (based on date ranges) to apply it to your target table. This is what I generally do. Your load process should be able to run off a given date range and intelligently load only that data, delete it from the target and reload it.

IMHO, the SSIS lookup component is of no use at the data volumes you're talking.

Altri suggerimenti

I prefer to stretch a full refresh as far as it will go, e.g. truncate the target table and deliver all rows without any lookup etc. I have one like this that chews nearly 1b rows in 3 hours. Most people are horrified by this approach intially, but it does work and is very reliable and easy to code & test.

Alternatively I would use an Execute SQL Task with a SQL MERGE statment. This gives you very detailed control over the source and target rows considered, how they are matched and what happens afterwards (insert or update).

At that scale I would be vary careful to create indexes to help the MERGE e.g on the joined columns. It will often be much much slower than the full refresh design, and will take far longer to code & test, having a higher risk of bugs.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top