non-optimal use of index

https://dba.stackexchange.com/questions/135617

01-10-2020
|

Question

I'm somewhat puzzled by the scenario below. The table involved (relevant parts) is defined as:

CREATE TABLE Nya. UPSEC_COURSE (
    UPSEC_COURSE_ID             CHAR(11) NOT NULL,
    CORE_UPSEC_SUBJECT          SMALLINT NOT NULL WITH DEFAULT 0,
    POINT                       SMALLINT NOT NULL WITH DEFAULT 0,
    CONVERTEDPOINT              SMALLINT,
    UPSEC_COURSETYPE_ID         CHAR(5) NOT NULL with default 'N',
    UPSEC_COURSE_SHORT          CHAR(20) NOT NULL with default,
    EQ_UPSEC_SUBJECT_ID         CHAR(5),
    UPSEC_MARKSCALE_ID          SMALLINT NOT NULL,
    SUBMIT_BY                   VARCHAR(130) NOT NULL , 
    SUBMIT_TIME                 TIMESTAMP NOT NULL WITH EFAULT     CURRENT TIMESTAMP, 
    UPSEC_COURSE                VARCHAR(64) NOT NULL with default
) IN USERSPACE1 @

CREATE UNIQUE INDEX NYA.XPK_UPSEC_COURSE ON NYA.UPSEC_COURSE 
    (UPSEC_COURSE_ID) INCLUDE (UPSEC_COURSE, UPSEC_MARKSCALE_ID, EQ_UPSEC_SUBJECT_ID
                          ,UPSEC_COURSETYPE_ID, CONVERTEDPOINT, CORE_UPSEC_SUBJECT)
ALLOW REVERSE SCANS 
COLLECT SAMPLED DETAILED STATISTICS @

The query under concern is:

Original Statement:
------------------
SELECT 
  RTRIM(UC.EQ_UPSEC_SUBJECT_ID) AS EQ_UPSEC_SUBJECT_ID,
  LEV.UPSEC_SUBJECTLEVEL_ID,
  RTRIM(UC.UPSEC_COURSETYPE_ID) 
FROM nya.UPSEC_COURSE UC 
JOIN nya.UPSEC_SUBJECTLEVEL_COURSE LEV 
   ON UC.UPSEC_COURSE_ID = LEV.UPSEC_COURSE_ID 
WHERE LEV.UPSEC_COURSE_ID = ? 
  and UC.UPSEC_COURSETYPE_ID in (?, ?) 
  AND EXISTS (
      SELECT 1 
      FROM NYA.UPSEC_COURSE UC2 
      JOIN NYA.UPSEC_COURSE_WEIGHTED UCW 
          ON UC2.UPSEC_COURSE_ID = UCW.UPSEC_COURSE_ID 
      WHERE UC2.EQ_UPSEC_SUBJECT_ID = UC.EQ_UPSEC_SUBJECT_ID
  )

All indexes and tables are reorged and have updated statistics.

The query reads roughly 10000 rows per execution due to a hash-join (I replaced the parameter markers with constants in these numbers)

NUM_EXECUTIONS       ROWS_READ            POOL_DATA_L_READS    POOL_INDEX_L_READS  
-------------------- -------------------- -------------------- --------------------
                   5                50535                 1415                  145

Access Plan:
-----------
        Total Cost:             665.512
        Query Degree:           1

                                  Rows 
                                 RETURN
                                 (   1)
                                  Cost 
                                   I/O 
                                   |
                                    1 
                                 TBSCAN
                                 (   2)
                                 665.512 
                                   296 
                                   |
                                    1 
                                 SORT  
                                 (   3)
                                 665.512 
                                   296 
                                   |
                                 6.04348 
                                 ^HSJOIN
                                 (   4)
                                 665.511 
                                   296 
                     /-------------+-------------\
                 6.04348                            1 
                 HSJOIN^                         IXSCAN
                 (   5)                          (  10)
                 652.652                         12.8584 
                   295                              1 
         /---------+---------\                     |
      10014                   2641                2663 
     TBSCAN                  NLJOIN          INDEX: NYA     
     (   6)                  (   7)   XPK_UPSEC_SUBJECTLEVEL_COURSE
     482.247                 169.719               Q6
       271                     24 
       |              /--------+--------\
      10014          1                   2641 
 TABLE: NYA       IXSCAN                IXSCAN
  UPSEC_COURSE    (   8)                (   9)
       Q5         25.7102               144.009 
                     2                    22 
                    |                     |
                   10014                 2641 
              INDEX: NYA            INDEX: NYA     
             XPK_UPSEC_COURSE  XPK_UPSEC_COURSE_WEIGHTED
                    Q7                    Q4

As mentioned table and indexes are reorged and have updated statistics on. If I create an almost identical index

CREATE UNIQUE INDEX TMP.XPK_UPSEC_COURSE ON NYA.UPSEC_COURSE 
(UPSEC_COURSE_ID) INCLUDE (UPSEC_COURSE, UPSEC_MARKSCALE_ID, EQ_UPSEC_SUBJECT_ID
                          ,UPSEC_COURSETYPE_ID, CONVERTEDPOINT, CORE_UPSEC_SUBJECT, SUBMIT_BY)
ALLOW REVERSE SCANS 
COLLECT SAMPLED DETAILED STATISTICS @

I.e. only difference is that I included the completely irrelevant column SUBMIT by in the INCLUDE, I get much nicer numbers:

NUM_EXECUTIONS       ROWS_READ            POOL_DATA_L_READS    POOL_INDEX_L_READS  
-------------------- -------------------- -------------------- --------------------
                   5                    0                    0                  125

The plan is changed to:

                          Rows 
                          RETURN
                          (   1)
                           Cost 
                            I/O 
                            |
                             1 
                          NLJOIN
                          (   2)
                          123.687 
                          34.985 
                 /----------+----------\
                1                         1 
             IXSCAN                    ^NLJOIN
             (   3)                    (   5)
             12.8584                   110.829 
                1                      33.985 
               |                     /---+----\
              2663                  1         6.14874 
         INDEX: NYA              IXSCAN       FILTER
  XPK_UPSEC_SUBJECTLEVEL_COURSE  (   6)       (   7)
               Q5                25.7101      609.94 
                                    2         229.197 
                                   |            |
                                  10107        2687 
                             INDEX: NYA       ^MSJOIN
                            XPK_UPSEC_COURSE  (   8)
                                   Q6         609.77 
                                              229.197 
                                         /------+------\
                                      2687                1 
                                     IXSCAN            FILTER
                                     (   9)            (  10)
                                     144.02            471.243 
                                       22                210 
                                       |                 |
                                      2687              10107 
                                 INDEX: NYA            IXSCAN
                            XPK_UPSEC_COURSE_WEIGHTED  (  11)
                                       Q3              471.243 
                                                         210 
                                                         |
                                                        10107 
                                                   INDEX: TMP     
                                                  XPK_UPSEC_COURSE
                                                         Q4

If I create the table in another schema with the original index definition I also get this nice behaviour.

Question is why db2 insist on using the index the way it does? I get the feeling that the index is not healthy and should be dropped and recreated. Only problem is that ~50 foreign keys are referencing the p.k. that the index supports, so I'd rather avoid that. Beside reorging (which I tried), what can be done to change the plan?

I noticed that INDEXREC = RESTART, so marking the index as invalid probably won't take effect until restart

Any thoughts?

EDIT: Added result of db2 reorgchk on table nya.upsec_course

Table statistics:

F1: 100 * OVERFLOW / CARD < 5
F2: 100 * (Effective Space Utilization of Data Pages) > 70
F3: 100 * (Required Pages / Total Pages) > 80

SCHEMA.NAME                     CARD     OV     NP     FP ACTBLK    TSIZE  F1  F2  F3 REORG
----------------------------------------------------------------------------------------
Table: NYA.UPSEC_COURSE
                               10107      0    273    273      -  1081449   0 100 100 --- 
----------------------------------------------------------------------------------------

Index statistics:

F4: CLUSTERRATIO or normalized CLUSTERFACTOR > 80
F5: 100 * (Space used on leaf pages / Space available on non-empty leaf pages) > MIN(50, (100 - PCTFREE))
F6: (100 - PCTFREE) * (Amount of space available in an index with one less level / Amount of space required for all keys) < 100
F7: 100 * (Number of pseudo-deleted RIDs / Total number of RIDs) < 20
F8: 100 * (Number of pseudo-empty leaf pages / Total number of leaf pages) < 20

SCHEMA.NAME                 INDCARD  LEAF ELEAF LVLS  NDEL    KEYS LEAF_RECSIZE NLEAF_RECSIZE LEAF_PAGE_OVERHEAD NLEAF_PAGE_OVERHEAD  PCT_PAGES_SAVED  F4  F5  F6  F7  F8 REORG  
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Table: NYA.UPSEC_COURSE
Index: NYA.XAK1_UPSEC_COURSE
                              10107    77     0    2     0   10107           16            16                416                 416                0  87  93   -   0   0 ----- 
Index: NYA.XPK_UPSEC_COURSE
                              10107   168     0    3     0   10107           47            11                296                 496                0  87  90  73   0   0 ----- 
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

I haven't used reorgchk in years (I usually just look at the number of overflows to determine if reorg is needed) but as far as I can tell the numbers looks decent enough.

Details about the other table involved. UPSEC_COURSE_WEIGHTED is a subset of UPSEC_COURSE:

CREATE TABLE NYA.UPSEC_COURSE_WEIGHTED (
    UPSEC_COURSE_ID CHAR(11) NOT NULL
) IN USERSPACE1 @

CREATE UNIQUE INDEX NYA.XPK_UPSEC_COURSE_WEIGHTED ON NYA.UPSEC_COURSE_WEIGHTED
    ( UPSEC_COURSE_ID ) 
ALLOW REVERSE SCANS 
COLLECT SAMPLED DETAILED STATISTICS @  

ALTER TABLE NYA.UPSEC_COURSE_WEIGHTED ADD CONSTRAINT XPK_UPSEC_COURSE_WEIGHTED
    PRIMARY KEY ( UPSEC_COURSE_ID ) @

ALTER TABLE NYA.UPSEC_COURSE_WEIGHTED ADD CONSTRAINT XFK_UPSEC_COURSE
    FOREIGN KEY (UPSEC_COURSE_ID)
    REFERENCES NYA.UPSEC_COURSE (UPSEC_COURSE_ID)
        ON DELETE CASCADE
        ON UPDATE RESTRICT @

Reorgchk for table nya.upsec_course_weighted

F1: 100 * OVERFLOW / CARD < 5
F2: 100 * (Effective Space Utilization of Data Pages) > 70
F3: 100 * (Required Pages / Total Pages) > 80

SCHEMA.NAME                     CARD     OV     NP     FP ACTBLK    TSIZE  F1  F2  F3 REORG
----------------------------------------------------------------------------------------
Table: NYA.UPSEC_COURSE_WEIGHTED
                                2687      0     15     15      -    56427   0 100 100 --- 
----------------------------------------------------------------------------------------

Index statistics:

F4: CLUSTERRATIO or normalized CLUSTERFACTOR > 80
F5: 100 * (Space used on leaf pages / Space available on non-empty leaf pages) > MIN(50, (100 - PCTFREE))
F6: (100 - PCTFREE) * (Amount of space available in an index with one less level / Amount of space required for all keys) < 100
F7: 100 * (Number of pseudo-deleted RIDs / Total number of RIDs) < 20
F8: 100 * (Number of pseudo-empty leaf pages / Total number of leaf pages) < 20

SCHEMA.NAME                 INDCARD  LEAF ELEAF LVLS  NDEL    KEYS LEAF_RECSIZE NLEAF_RECSIZE LEAF_PAGE_OVERHEAD NLEAF_PAGE_OVERHEAD  PCT_PAGES_SAVED  F4  F5  F6  F7  F8 REORG  
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Table: NYA.UPSEC_COURSE_WEIGHTED
Index: NYA.XPK_UPSEC_COURSE_WEIGHTED
                               2687    22     0    2    12    2687           11            11                496                 496                0  98  74   -   0   0 ----- 
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Solution

FWIW, I tried investigating the differences between a newly created index in a db copy and the original index in syscat.indexes, sysibm.sysindexes and sysstat.indexes. The only difference I could spot was a slightly better density (100) for the new index compared with the old one (87).

I therefore ended up recreating the index, and now the execution plan is much nicer. I created a small python script to generate drop and create statements for the f.k. pointing to the table. I had an old DB2.py driver at hand so the following snippet wont work with the official python driver shipped with 10.5. Nevertheless it should be easy to adopt to that, so I'll add the code anyhow:

#!/usr/bin/python

import sys
import getopt
import DB2

def main():
    try:
        opts, args = getopt.getopt(sys.argv[1:], "d:t:s:u:p:")
    except getopt.GetoptError:
        sys.exit(-1)
    for o, a in opts:
        if (o == "-d"):
            dbname = a
        if (o == "-t"):
            tables = a.split(',')
        if (o == "-s"):
            schema = a
        if (o == "-u"):
            user = a
        if (o == "-p"):
            pwd = a

    conn = DB2.connect(dsn=dbname, uid=user, pwd=pwd)
    fk_sql = """
        select tabschema, tabname, constname, FK_COLNAMES,
               reftabschema, reftabname, PK_COLNAMES,
               case DELETERULE WHEN 'C' then 'CASCADE' 
                    WHEN 'R' then 'RESTRICT' 
                    ELSE 'NO ACTION' end,
               case UPDATERULE WHEN 'C' then 'CASCADE'  
                    WHEN 'R' then 'RESTRICT'  
                    ELSE 'NO ACTION' end
        from syscat.references 
        where reftabname = ? and reftabschema = ?"""
    c1 = conn.cursor()
    c2 = conn.cursor()

    create_stmts = []
    drop_stmts = []

    for t in tables:
        c1.execute(fk_sql, (t,schema))
        restore_sql = ''
        for row in c1.fetchall():
            tabscema    = row[0]
            tabname     = row[1]
            constname   = row[2]
            FK_COLNAMES = row[3]
            reftabschema= row[4]
            reftabname  = row[5]
            PK_COLNAMES = row[6]
            DELETERULE  = row[7]
            UPDATERULE  = row[8]

            fk = filter ((lambda x:x<>''), FK_COLNAMES.split(' '))
            cols = ''
            for c in fk: cols = cols + ',' + c
            fkcols = cols[1:]

            pk = filter ((lambda x:x<>''), PK_COLNAMES.split(' '))
            cols = ''
            for c in pk: cols = cols + ',' + c
            pkcols = cols[1:]

            create = """
                alter table %s.%s add constraint %s
                    foreign key (%s)
                        references %s.%s (%s)
                            on update %s
                            on delete %s;
            """ % (tabscema,tabname,constname,fkcols,reftabschema, reftabname,pkcols,UPDATERULE,DELETERULE)


            drop = """
                alter table %s.%s drop constraint %s;
            """ % (tabscema,tabname,constname)

            create_stmts.append(create)
            drop_stmts.append(drop)

        conn.rollback()
        for x in drop_stmts:
            print x

        for x in create_stmts:
            print x

if __name__ == "__main__":
    main()

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange