How to increase ETL performance in Informatica for Netezza as a source and SQL Server as a target?

dba.stackexchange https://dba.stackexchange.com/questions/111584

  •  29-09-2020
  •  | 
  •  

Question

What settings or configuration on the Informatica server, in the Informatica software itself, or on the database servers can be changed to increase Informatica ETL throughput? What are some benchmarks we can set to troubleshoot performance? We are specifically using Netezza as a source and SQL Server as a target.

Please exclude multi-threading and Informatica partitioning from this question.

This we've done in the past:

  • restart servers every so often
  • remove indexes on target tables in SQL Server before ETL load
  • increase commit level
Was it helpful?

Solution

If powercenter informatica is the bottleneck (and not netezza, sql server, or the network) then there is a setting in the powercenter Session that may help. IIRC it's "record buffer size" or something similar. Change it from Default to 512MB. If that helps, it is best to reduce it to something more sensible by experimenting with lower values.

This setting is not the size it uses to hold a single record in memory, but it needs to be at least big enough to fit one record in it. The built-in help is a bit vague on this point.

In the opposite scenario (SQL Server -> Netezza) I've noticed that datatypes can play a part as well. powercenter can grossly overestimate the amount of memory it needs to reserve for a single record if it the source table layout contains LONG/NTEXT/VARBINARY datatypes. Netezza doesn't have those, but perhaps it also matters if the target contains large fields.

The powercenter session log file should contain some information about how much memory it reserves to transfer the data. If it's too low, it can become the bottleneck.

OTHER TIPS

My experience with Informatica accessing both SQL Server and Netezza can be summed up to this:

  1. Reading is equally fast (100,000 rows/sec; sometimes twice that speed), provided that the select SQL is simple enough, that is:

    • no joins;

    • no group by;

    • no sort;

    • where clauses only against the cluster key of the SQL Server table.

    In all other cases Netezza will beat SQL Server.

  2. Inserts with PowerCenter are of the array type, and SQL Server can usually receive 2000 to 4000 rows per second.

    Inserts into Netezza run at 80,000 to 200,000 rows per second if PowerCenter is not the bottleneck.

  3. Updates/deletes with PowerCenter have been notoriously slow on all databases for years, since the SQL executed is singletons and not an array. Therefore, a session typically drops to ~500 rows per second with SQL Server as a target, and a mere 8 rows per second with Netezza.

    Due to that, we have defined staging tables per PowerCenter target in Netezza and write all updates/deletes to those tables while the session executes. These changes are then applied in bulk as a target post SQL. This scales well since all write operations now run at similar speeds to insert.

The best solution for the notorious writer bottleneck with SQL Server as a target is to spend a lot of time inside PowerCenter comparing source with target and only write whatever difference you detect. That will carry you a long way but not scale.

My final question is, why move data from Netezza to SQLserver? If your business has requirements that truly cannot be met on Netezza, please specify which, perhaps they could be addressed.

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top