Does Ab Initio support Oracle Merge statement?

Question 1

I would certainly not rely on a source system to tell me whether rows are present in the target table or not. My instinct says to go for a parallel, nologging (if possible), compress (if possible) load into a staging table followed by a merge -- if Ab-Initio does not support Merge then hopefully it supports a call to a PL/SQL procedure, or direct execution of a SQL statement.

If this is a large amount of data I'd like to arrange hash partitioning on the join key for the new and current data sets too.

Question 2

The solution which you just depicted (inserts and updates in a staging table and then merging the content in the main table) is feasible.

A design decision is: for the incoming data files there will be inserts and updates.

I don't know the background of this decision but you should know that this solution will result in longer execution time. In order to execute inserts and updates you have to use the "Update Table" component which is slower than a simpler "Output Table" component. By the way don't use the same "Update Table" component for inserts and updates simultaneously. Use a separate "Update Table" for inserts and another one for updates instead (you'll experience dramatic performance boost in this way). (If you can change the above mentioned design decision then use an "Output Table" instead.)

In either case set the "Update Table"/"Output Table" components to "never abort" so that your graph won't fail if the same insert statement occurs twice or if there's no entry to update on.

Finally the "oracle merge" statement should be fired/executed from a "Run SQL" component when the processing of all the inserts and updates are finished. Use phases to make sure it happens this way...

If you intend to build a graph with parallel execution then make sure that the insert and update statements for the same entries will be processed by the same partitions. (Use the primary key of the final table as the key in the "partition by key" component.)

If you want to have an overview of how many duplicated inserts or wrong updates occur in your messy input then use the "Reject" (and eventually "Error") port of the appropriate "Update Table"/"Output Table" components for further processing.