In DB2 LUW, When should I use a 4K, 8K, or 16K tablespace rather than just creating a 32K tablespace and being done with it?

https://dba.stackexchange.com/questions/239991

05-02-2021
|

Question

We're using DB2 LUW 10.5 & 11.1 on Windows & Linux systems, in case it's relevant to he answers.

Question: Is there a time where it would be correct to use 4K rather than 32K? If so, why? (Does it perform better when it can be used?) Or, is it just a legacy appendage from some prehistoric time when 4K was just the page size?

Background: I have always just created a 4K, 8K, 16K, & 32K tablespaces and associated bufferpools when I create DB2 db's.

My manager is challenging me on this. (Good for him - I should know this!) He thinks we should just create a 32K tablespace and be done with it.

I'm not able to find anything that tells me, for instance, that when the row size allows it, we should use 4K rather than 32K because XYZ. It tells me I CAN do this, but not that/when I should.

Solution

It's a good question, but the answer is worthy of a whole chapter in a book that does not exist, sadly. Ember Crooks' article you're linking to is a good overview; I'll add here some random factors that one might want to take into consideration when deciding on the tablespace page sizes.

TL; DR.

Choose one page size that is most suitable for your data, considering the points below. If your performance testing shows problems that can be addressed by moving some tables to a tablespace with a different page size, do it judiciously.

Decision factors.

As you have mentioned, table row width determines the minimum page size required to accommodate them. It doesn't follow though that you always want "the smallest that will work for your data".

Firstly, the usual arguments of "avoiding unnecessary I/O" and "handling less data at a time" with smaller page sizes may be a bit misplaced. If your tablespace container sits on a ZFS file system on an LVM volume on a VMWare virtual disk on a Ceph volume on an unknown number of RAID6 devices that may use spinning disks or SSDs, do you really know how much physical I/O your 4K (or 32K) read request will cause?

Smaller page sizes certainly help if your workload creates tablespace hotspots (most I/O requests go to a limited number of pages) that cannot be resolved by other means. In such situations smaller pages can improve bufferpool efficiency and reduce page latch waits between agents competing for access to the same page. On the other hand, smaller page size means longer LRU chains and hence possibly less efficient page cleaning.

There are also arguments for larger page sizes.

Presence of LOB data.

Normally LOB data are stored outside table rows, in separate data structures that have several performance disadvantages:

You can only access them via synchronous direct reads and writes that bypass the database bufferpools; the only caching available is that of the underlying file system, if enabled for the tablespace; direct reads also don't take advantage of page prefetching.
Since LOBs are not loaded into the bufferpools, repeated access to the same data will cause repeated direct read requests.
They are not compressed, even if the table compression is enabled.

If most of your LOB values are relatively small and could fit into the row itself given larger page size (as is often the case), you could store them inline, alleviating those disadvantages.

Compression.

Larger page sizes improve efficiency of adaptive (page-level) compression. Often I/O reduction that data compression affords outweighs its CPU costs.

Don't forget temp tablespaces.

Even if each of your tables individually can be placed into a 4K tablespace, a system temporary tablespace with a larger page size (and a corresponding bufferpool) might be required. If a query joins sub-4K rows from two or more tables, the result set width might exceed the 4K limit and, if it needs to spill, an appropriately-sized tablespace will be needed.

It's worth mentioning that creating tablespaces of each possible page size "just in case" is not a good idea, because, as you said, each requires a dedicated bufferpool, and multiple bufferpools, unless necessary, are almost always worse (less efficient) than a single large one.

OTHER TIPS

I finally did hit just the right google search and found a note on a post from Ember Crooks (https://datageek.blog/2013/07/09/db2-luw-what-is-a-page/) that says smaller pages are preferred for OLTP or e-commerce sites so you're handling smaller amounts of data. I suppose that's the answer, then. Use the smallest that will work for your data, recognizing that you'd like to get more than one row on a page.

I'm going to leave this up with the answer rather than just deleting it in case someone else comes along looking for it.

Such a choice generally depends on type of workload in your system. Briefly:

Transactional systems typically read data by few records per request. The smallest chunk of storage in Db2 is data page. To read a single data row Db2 has to read whole data page from disk. Rows needed by typical request often reside in data pages randomly, so, it’s not unusual, that number of disk IOs is close to a number of records returned by a typical query in such a system. So, it’s 4K vs 32K per 1 row/IO read - more unnecessary IO for the larger page size to return the same result set.

DSS systems typically scan large amounts of data - it can be done more efficiently by larger reads per IO request generally.

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange