Question

Recently I noticed that the part of my ETL process loading the data into the staging area sometimes takes longer and sometimes shorter.

With the following query (executed in the integration services catalogs database) I was comparing two different runs and came to find out, that the recreate table statements are taking much longer (sometimes also the loading part, but I think the main problem is this). Here's the query to compare the different runs:

select
es1.execution_path,
es1.execution_duration as es1dura,
es2.execution_duration as es2dura,
(es2.execution_duration * 1.0) / es1.execution_duration * 100
from
catalog.executable_statistics es1
join catalog.executable_statistics es2 on es1.execution_path = es2.execution_path
where es1.execution_id = 239
and es2.execution_id = 10290
and es1.execution_path like '%create table%'
order by 
--(es2.execution_duration * 1.0) / es1.execution_duration * 100 desc
es1.execution_path

Part of the result is this:

enter image description here

This corresponds to the following part of the ETL:

enter image description here

Please don't mind the additional green line in the screenshot, this is just messy formatting in Visual Studio. There are no more tasks running in parallel and the job is running around midnight, there's no other job running on the database at the same time.

These tasks are really simple. One of the Recreate table tasks for example looks like this:

IF EXISTS (
    SELECT * FROM sys.tables
    WHERE name = 'admin_perso_abteilung'
)
DROP TABLE admin_perso_abteilung
GO
CREATE TABLE admin_perso_abteilung (
    [id] int,
    [perso_abteilung] nvarchar(50)
)
GO

Anyway, the question is, since the query result suggests, that the whole process is stuck there for some time before executing the recreate table statements in parallel, what could cause this? What can I inspect further to narrow down the problem?

Since I'm more of a developer than an administrator I'm a bit lost here, please guide me a little. Thanks.

Was it helpful?

Solution

(longer than comment... so posting as answer)

  • I would suggest you to not create table every time. You can just truncate the table. That will be more efficient - unless I am missing some dependency that you have not described in your question.
  • Set DelayValidation property to True for Data flow task.
  • Set ValidateExternalMetadata to False for individual data flow components.
  • For speeding up your data load, you can refer to my answer here.

OTHER TIPS

Chances are your drop/create statements are blocking each other on occasion. As already mentioned, you can truncate the tables instead, however a fairly simple test would be to run the drop/create tasks sequentially, and the data loads afterwards in parallel.

If you're desperate to know the detailed reasons behind the variable times, then do as suggested and run a trace or XE session to capture locking & blocking during the runs.

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top