Question

SETUP: I have a very large SSIS package that sorts through a lot of different flat files and throws them into tables that fit the file definitions to a T... except the date. The dates come over in the format: "MM/dd/yyyy HH:mm:ss.ffffff." Of course, for dbtimestamp, this needs to be converted to "yyyy-MM-dd HH:mm:ss.ffffff". On any given file I have at least 2 dates; created and updated. On some I have more.

PROBLEM: of the 26 files I parse, one is particularly larger than the other. In some cases it can be 40+ MB. It seemed to be running fine before, but running through a derived column component that derives 3 dates is now EXTREMELY slow. It takes around 30 minutes to parse ~90,600 rows.

I'm watching the data flow as it is executing and the bottleneck looks very much like the derived column. I report all error rows if they happen to exist and there are none, so I know it's not choking on the rows... I just can't figure out what is taking so long. The CPU shoots up to 100% while executing (no big surprise) but the memory stays low at 12%.

Here's the exact transformation that happens on each of 3 dates for each line:

(DT_DBTIMESTAMP)(SUBSTRING(COLUMN,7,4) + "-" + SUBSTRING(COLUMN,1,2) 
 + "-" + SUBSTRING(COLUMN,4,2) + SUBSTRING(COLUMN,11,14))

Any ideas?

Was it helpful?

Solution

I copied the DB because of the extensive changes and the indexes were not brought over properly. This slowed the biggest table down. Removing the derived step helped me figure out that it was further down the line, despite the fact that monitoring the SSIS execution of the data flow made it seem as thought it was the derived step.

I realize this is probably not going to happen often, but I'll leave the thread here in case anyone else has the problem and mistakenly identifies it like I did!

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top