Can data and schema be changed with DB2/z load/unload?

https://stackoverflow.com/questions/11045756

14-06-2021
|

Question

I'm trying to find an efficient way to migrate tables with DB2 on the mainframe using JCL. When we update our application such that the schema changes, we need to migrate the database to match.

What we've been doing in the past is basically creating a new table, selecting from the old table into that, deleting the original and renaming the new table to the original name.

Needless to say, that's not a very high-performance solution when the tables are big (and some of them are very big).

With latter versions of DB2, I know you can do simple things like alter column types but we have migration jobs which need to do more complicated things to the data.

Consider for example the case where we want to combine two columns into one (firstname + lastname -> fullname). Never mind that it's not necessarily a good idea to do that, just take it for granted that this is the sort of thing we need to do. There may be arbitrarily complicated transformations to the data, basically anything you can do with a select statement.

My question is this. The DB2 unload utility can be used to pull all of the data out of a table into a couple of data sets (the load JCL used for reloading the data, and the data itself). Is there an easy way (or any way) to massage this output of unload so that these arbitrary changes are made when reloading the data?

I assume that I could modify the load JCL member and the data member somehow to achieve this but I'm not sure how easy that would be.

Or, better yet, can the unload/load process itself do this without having to massage the members directly?

Does anyone have any experience of this, or have pointers to redbooks or redpapers (or any other sources) that describe how to do this?

Is there a different (better, obviously) way of doing this other than unload/load?

Solution

As you have noted, SELECTing from the old table into the new table will have very poor performance. Poor performance here is generally due to the relatively high costs of insertion INTO the target table (index building and RI enforcement). The SELECT itself is generally not a performance issue. This is why the LOAD utility is generally perferred when large tables need to be populated from scratch, indices may be built more efficiently and RI may be deferred.

the UNLOAD utility allows unrestricted usage of SELECT. If you can SELECT data using scalar and/or column functions to build a result set that is compatible with your new table column definitions then UNLOAD can be used to do the data conversion. Specify a SELECT statement in SYSIN for the UNLOAD utility. Something like:

 //SYSIN DD *
 SELECT CONCAT(FIRST_NAME, LAST_NAME) AS "FULLNAME"
 FROM OLD_TABLE
 /*

The resulting SYSRECxx file will contain a single column that is a concatenation of the two identified columns (result of the CONCAT function) and SYSPUNCH will contain a compatible column definition for FULLNAME - the converted column name for the new table. All you need to do is edit the new table name in SYSPUNCH (this will have defaulted to TBLNAME) and LOAD it. Try not to fiddle with the SYSRECxx data or the SYSPUNCH column definitions - a goof here could get ugly.

Use the REPLACE option when running the LOAD utility to create the new table (I think the default is LOAD RESUME which won't work here). Often it is a good idea to leave RI off when running the LOAD, this will improve performance and save the headache of figuring out the order in which LOAD jobs need to be run. Once finished you need to verify the RI and build the indices.

The LOAD utility is documented here

OTHER TIPS

I assume that I could modify the load JCL member and the data member somehow to achieve this but I'm not sure how easy that would be.

I believe you have provided the answer within your question. As to the question of "how easy that would be," it would depend on the nature of your modifications.

SORT utilities (DFSORT, SyncSort, etc.) now have very sophisticated data manipulation functions. We use these to move data around, substitute one value for another, combine fields, split fields, etc. albeit in a different context from what you are describing.

You could do something similar with your load control statements, but that might not be worth the trouble. It will depend on the extent of your changes. It may be worth your time to attempt to automate modification of the load control statements if you have a repetitive modification that is necessary. If the modifications are all "one off" then a manual solution may be more expedient.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow