Domanda

In a DataFlow, suppose i have to derive a new column myDate and the value will be GetDate() or sysdatetime(), whatever... I have two options: Add this column in the Source OleDB select like:

Select 
    col1, col2, ..., getDate() as myDate 
from myTable

or i can add a SSIS "Derived Column" step and put my getDate() expression there.

Which option is better, in terms of performance?

È stato utile?

Soluzione

It depends. There's a subtle difference between using GetDate()/current_timsetamp in your query versus using it in a derived column.

The getdate/current_timestamp in your source query is going to be evaluated once. Thus your myDate value will be constant for the entirety of your data set. Maybe that's correct for your application, maybe it's not.

Using the expression getdate in an Derived Column is going to be evaluated every N interval. For large imports, you can get a sizable drift in the value. That is, the first row committed has 2014-02-06T11:46:00.000 while the last row has 2014-02-06T15:21:19.762 Again, whether this is the more desirable behaviour is up to you and your application. It complicated the queries I had to write to correlate import behaviour in our DW as we could only correlate import activities based on dates.

If you wish to use a Derived Column but have a consistent reference point, use some of the System namespace variables. We often used @[System::StartTime] but @[System::ContainerStartTime] would provide a start time more closely associated to the beginning of your data flow.

Altri suggerimenti

Doing it on the database would be better since you don't have to use up memory to create a new column for each row in the data flow.

I think it would better if you do it inside the select statement rather than a derived column inside the ssis.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top