Why does DB2 suggest one table per tablespace?

https://stackoverflow.com/questions/9374338

28-10-2019
|

Question

The DB2 docs for DB2/z v10 have the following snippet in the tablespaces section:

As a general rule, you should have only one table in each table space.

But it doesn't actually provide any rationale for this.

We have some tables storing historical time-based information along the following lines (greatly reduced in complexity but should be enough to illustrate):

Table HOURLY_CPU_USAGE:
    RecDate        date
    RecTime        time
    Node           char(32)
    MaxCpuUsage    float
    primary key    (RecDate, RecTime, Node)
Table DAILY_CPU_USAGE:
    RecDate        date
    Node           char(32)
    MaxCpuUsage    float
    primary key    (RecDate, Node)
Table MONTHLY_CPU_USAGE:
    RecDate        date
    Node           char(32)
    MaxCpuUsage    float
    primary key    (RecDate, Node)

(the daily table has all the hourly records rolled up into a single day, and the monthly table does the same with the daily data, rolling it up into the row with date YYYY-MM-01).

Now it seems to me that this tables are all very similar in purpose and I'm not certain why we'd want to keep them in separate tablespaces.

Discount for now the possibility of combining them into a single table, that's a suggestion I've made but there are complications preventing it.

What is the rationale behind the "one table per tablespace" guideline? What are the exceptions, if any? I'm assuming they're may be exceptions since it seems very much a guideline rather than a hard-and-fast rule.

Solution

Just a wild guess... but maybe IBM recommend not more than one table per table space because many db/2 utilities operate at the level of the table space. If you put multiple tables into one table space then utilities operate on all of the tables as a unit.

For example, backup and restore work at the table space level. You cannot backup/restore individual tables within the same table space. They are all backed up or restored as a unit. I believe the same sort of thing applies to other utilities and probably for many tuning parameters as well.

OTHER TIPS

These days the main reason for maintaining one table per table space is an administrative one. Most DB2 utilities work at the table space level. For example if you perform a LOAD REPLACE on a table space for a specific table then all the other tables will be end up empty as the first thing the LOAD REPLACE does is to delete all rows.

So "why wouldn't you keep one table per table space?". I think it's reasonable and even desirable to include multiple tables in a single table space when the table are related to the extent that one is useless without the other. Eg. CustomerTable + NextCustomerIDTable.

Another consideration is the type of table space. Depending on the type of table space you have created there could be performance implications with creating multiple tables in a single table space. If you are not using segmented table spaces a table space scan will read all pages in the table space including the pages from other tables. See "Table space scan" topic here: http://publib.boulder.ibm.com/infocenter/dzichelp/v2r2/index.jsp?topic=%2Fcom.ibm.db2.doc.ve%2Fdvnhlpcn_tablescan.htm

It seams that they have changed the text on their documentation.

The link provided at the Question now contains the following information:

The number of tables that you should define in a table space depends on the characteristics of the tables:

If a table might become large in size, it is better to put the table in its own table space. This design simplifies performance tuning, and in particular, buffer pool tuning. For smaller tables, multiple-table segmented table spaces are better. This design helps to reduce the number of data sets that need to be managed for backup and recovery, and the number of data sets that the database system needs to open and close during DB2 operations.

It is better to minimize the number of table spaces in each database for the following reasons:

During execution of data definition statements, the database system holds an exclusive lock on the entire database until a commit operation is executed. The exclusive lock performs the following functions: The exclusive lock prevents concurrent executions of data definition statements for tables and indexes in the same database. If the dynamic statement cache is disabled (subsystem parameter CACHEDYN=NO), the database system uses the database lock to serialize execution of data definition statements and dynamic SQL statements that access tables and indexes in the database.

If fewer table spaces are in the database, fewer table spaces are concurrently locked. During execution of the SWITCH phase of online REORG utility operations, the database system obtains an exclusive lock on the entire database to serialize execution of online REORG operations and data definition statements on tables and indexes in the database.

If fewer tables are in the database, fewer tables are concurrently locked. The logging volume for data definition statements is smaller when fewer table spaces are in the database.

Generally it's because the performance options tend to be better for "one table per tablespace" configurations. For example the ability to do Limited Partition Scan for certain queries if the table is partitioned (which REQUIRES 1 Tb per TS).

(But as a mainframe performance person I would say that, wouldn't I?) :-)

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow