non-optimal use of index
Question
I'm somewhat puzzled by the scenario below. The table involved (relevant parts) is defined as:
CREATE TABLE Nya. UPSEC_COURSE (
UPSEC_COURSE_ID CHAR(11) NOT NULL,
CORE_UPSEC_SUBJECT SMALLINT NOT NULL WITH DEFAULT 0,
POINT SMALLINT NOT NULL WITH DEFAULT 0,
CONVERTEDPOINT SMALLINT,
UPSEC_COURSETYPE_ID CHAR(5) NOT NULL with default 'N',
UPSEC_COURSE_SHORT CHAR(20) NOT NULL with default,
EQ_UPSEC_SUBJECT_ID CHAR(5),
UPSEC_MARKSCALE_ID SMALLINT NOT NULL,
SUBMIT_BY VARCHAR(130) NOT NULL ,
SUBMIT_TIME TIMESTAMP NOT NULL WITH EFAULT CURRENT TIMESTAMP,
UPSEC_COURSE VARCHAR(64) NOT NULL with default
) IN USERSPACE1 @
CREATE UNIQUE INDEX NYA.XPK_UPSEC_COURSE ON NYA.UPSEC_COURSE
(UPSEC_COURSE_ID) INCLUDE (UPSEC_COURSE, UPSEC_MARKSCALE_ID, EQ_UPSEC_SUBJECT_ID
,UPSEC_COURSETYPE_ID, CONVERTEDPOINT, CORE_UPSEC_SUBJECT)
ALLOW REVERSE SCANS
COLLECT SAMPLED DETAILED STATISTICS @
The query under concern is:
Original Statement:
------------------
SELECT
RTRIM(UC.EQ_UPSEC_SUBJECT_ID) AS EQ_UPSEC_SUBJECT_ID,
LEV.UPSEC_SUBJECTLEVEL_ID,
RTRIM(UC.UPSEC_COURSETYPE_ID)
FROM nya.UPSEC_COURSE UC
JOIN nya.UPSEC_SUBJECTLEVEL_COURSE LEV
ON UC.UPSEC_COURSE_ID = LEV.UPSEC_COURSE_ID
WHERE LEV.UPSEC_COURSE_ID = ?
and UC.UPSEC_COURSETYPE_ID in (?, ?)
AND EXISTS (
SELECT 1
FROM NYA.UPSEC_COURSE UC2
JOIN NYA.UPSEC_COURSE_WEIGHTED UCW
ON UC2.UPSEC_COURSE_ID = UCW.UPSEC_COURSE_ID
WHERE UC2.EQ_UPSEC_SUBJECT_ID = UC.EQ_UPSEC_SUBJECT_ID
)
All indexes and tables are reorged and have updated statistics.
The query reads roughly 10000 rows per execution due to a hash-join (I replaced the parameter markers with constants in these numbers)
NUM_EXECUTIONS ROWS_READ POOL_DATA_L_READS POOL_INDEX_L_READS
-------------------- -------------------- -------------------- --------------------
5 50535 1415 145
Access Plan:
-----------
Total Cost: 665.512
Query Degree: 1
Rows
RETURN
( 1)
Cost
I/O
|
1
TBSCAN
( 2)
665.512
296
|
1
SORT
( 3)
665.512
296
|
6.04348
^HSJOIN
( 4)
665.511
296
/-------------+-------------\
6.04348 1
HSJOIN^ IXSCAN
( 5) ( 10)
652.652 12.8584
295 1
/---------+---------\ |
10014 2641 2663
TBSCAN NLJOIN INDEX: NYA
( 6) ( 7) XPK_UPSEC_SUBJECTLEVEL_COURSE
482.247 169.719 Q6
271 24
| /--------+--------\
10014 1 2641
TABLE: NYA IXSCAN IXSCAN
UPSEC_COURSE ( 8) ( 9)
Q5 25.7102 144.009
2 22
| |
10014 2641
INDEX: NYA INDEX: NYA
XPK_UPSEC_COURSE XPK_UPSEC_COURSE_WEIGHTED
Q7 Q4
As mentioned table and indexes are reorged and have updated statistics on. If I create an almost identical index
CREATE UNIQUE INDEX TMP.XPK_UPSEC_COURSE ON NYA.UPSEC_COURSE
(UPSEC_COURSE_ID) INCLUDE (UPSEC_COURSE, UPSEC_MARKSCALE_ID, EQ_UPSEC_SUBJECT_ID
,UPSEC_COURSETYPE_ID, CONVERTEDPOINT, CORE_UPSEC_SUBJECT, SUBMIT_BY)
ALLOW REVERSE SCANS
COLLECT SAMPLED DETAILED STATISTICS @
I.e. only difference is that I included the completely irrelevant column SUBMIT by in the INCLUDE, I get much nicer numbers:
NUM_EXECUTIONS ROWS_READ POOL_DATA_L_READS POOL_INDEX_L_READS
-------------------- -------------------- -------------------- --------------------
5 0 0 125
The plan is changed to:
Rows
RETURN
( 1)
Cost
I/O
|
1
NLJOIN
( 2)
123.687
34.985
/----------+----------\
1 1
IXSCAN ^NLJOIN
( 3) ( 5)
12.8584 110.829
1 33.985
| /---+----\
2663 1 6.14874
INDEX: NYA IXSCAN FILTER
XPK_UPSEC_SUBJECTLEVEL_COURSE ( 6) ( 7)
Q5 25.7101 609.94
2 229.197
| |
10107 2687
INDEX: NYA ^MSJOIN
XPK_UPSEC_COURSE ( 8)
Q6 609.77
229.197
/------+------\
2687 1
IXSCAN FILTER
( 9) ( 10)
144.02 471.243
22 210
| |
2687 10107
INDEX: NYA IXSCAN
XPK_UPSEC_COURSE_WEIGHTED ( 11)
Q3 471.243
210
|
10107
INDEX: TMP
XPK_UPSEC_COURSE
Q4
If I create the table in another schema with the original index definition I also get this nice behaviour.
Question is why db2 insist on using the index the way it does? I get the feeling that the index is not healthy and should be dropped and recreated. Only problem is that ~50 foreign keys are referencing the p.k. that the index supports, so I'd rather avoid that. Beside reorging (which I tried), what can be done to change the plan?
I noticed that INDEXREC = RESTART, so marking the index as invalid probably won't take effect until restart
Any thoughts?
EDIT: Added result of db2 reorgchk on table nya.upsec_course
Table statistics:
F1: 100 * OVERFLOW / CARD < 5
F2: 100 * (Effective Space Utilization of Data Pages) > 70
F3: 100 * (Required Pages / Total Pages) > 80
SCHEMA.NAME CARD OV NP FP ACTBLK TSIZE F1 F2 F3 REORG
----------------------------------------------------------------------------------------
Table: NYA.UPSEC_COURSE
10107 0 273 273 - 1081449 0 100 100 ---
----------------------------------------------------------------------------------------
Index statistics:
F4: CLUSTERRATIO or normalized CLUSTERFACTOR > 80
F5: 100 * (Space used on leaf pages / Space available on non-empty leaf pages) > MIN(50, (100 - PCTFREE))
F6: (100 - PCTFREE) * (Amount of space available in an index with one less level / Amount of space required for all keys) < 100
F7: 100 * (Number of pseudo-deleted RIDs / Total number of RIDs) < 20
F8: 100 * (Number of pseudo-empty leaf pages / Total number of leaf pages) < 20
SCHEMA.NAME INDCARD LEAF ELEAF LVLS NDEL KEYS LEAF_RECSIZE NLEAF_RECSIZE LEAF_PAGE_OVERHEAD NLEAF_PAGE_OVERHEAD PCT_PAGES_SAVED F4 F5 F6 F7 F8 REORG
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Table: NYA.UPSEC_COURSE
Index: NYA.XAK1_UPSEC_COURSE
10107 77 0 2 0 10107 16 16 416 416 0 87 93 - 0 0 -----
Index: NYA.XPK_UPSEC_COURSE
10107 168 0 3 0 10107 47 11 296 496 0 87 90 73 0 0 -----
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
I haven't used reorgchk in years (I usually just look at the number of overflows to determine if reorg is needed) but as far as I can tell the numbers looks decent enough.
Details about the other table involved. UPSEC_COURSE_WEIGHTED is a subset of UPSEC_COURSE:
CREATE TABLE NYA.UPSEC_COURSE_WEIGHTED (
UPSEC_COURSE_ID CHAR(11) NOT NULL
) IN USERSPACE1 @
CREATE UNIQUE INDEX NYA.XPK_UPSEC_COURSE_WEIGHTED ON NYA.UPSEC_COURSE_WEIGHTED
( UPSEC_COURSE_ID )
ALLOW REVERSE SCANS
COLLECT SAMPLED DETAILED STATISTICS @
ALTER TABLE NYA.UPSEC_COURSE_WEIGHTED ADD CONSTRAINT XPK_UPSEC_COURSE_WEIGHTED
PRIMARY KEY ( UPSEC_COURSE_ID ) @
ALTER TABLE NYA.UPSEC_COURSE_WEIGHTED ADD CONSTRAINT XFK_UPSEC_COURSE
FOREIGN KEY (UPSEC_COURSE_ID)
REFERENCES NYA.UPSEC_COURSE (UPSEC_COURSE_ID)
ON DELETE CASCADE
ON UPDATE RESTRICT @
Reorgchk for table nya.upsec_course_weighted
F1: 100 * OVERFLOW / CARD < 5
F2: 100 * (Effective Space Utilization of Data Pages) > 70
F3: 100 * (Required Pages / Total Pages) > 80
SCHEMA.NAME CARD OV NP FP ACTBLK TSIZE F1 F2 F3 REORG
----------------------------------------------------------------------------------------
Table: NYA.UPSEC_COURSE_WEIGHTED
2687 0 15 15 - 56427 0 100 100 ---
----------------------------------------------------------------------------------------
Index statistics:
F4: CLUSTERRATIO or normalized CLUSTERFACTOR > 80
F5: 100 * (Space used on leaf pages / Space available on non-empty leaf pages) > MIN(50, (100 - PCTFREE))
F6: (100 - PCTFREE) * (Amount of space available in an index with one less level / Amount of space required for all keys) < 100
F7: 100 * (Number of pseudo-deleted RIDs / Total number of RIDs) < 20
F8: 100 * (Number of pseudo-empty leaf pages / Total number of leaf pages) < 20
SCHEMA.NAME INDCARD LEAF ELEAF LVLS NDEL KEYS LEAF_RECSIZE NLEAF_RECSIZE LEAF_PAGE_OVERHEAD NLEAF_PAGE_OVERHEAD PCT_PAGES_SAVED F4 F5 F6 F7 F8 REORG
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Table: NYA.UPSEC_COURSE_WEIGHTED
Index: NYA.XPK_UPSEC_COURSE_WEIGHTED
2687 22 0 2 12 2687 11 11 496 496 0 98 74 - 0 0 -----
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Solution
FWIW, I tried investigating the differences between a newly created index in a db copy and the original index in syscat.indexes, sysibm.sysindexes and sysstat.indexes. The only difference I could spot was a slightly better density (100) for the new index compared with the old one (87).
I therefore ended up recreating the index, and now the execution plan is much nicer. I created a small python script to generate drop and create statements for the f.k. pointing to the table. I had an old DB2.py driver at hand so the following snippet wont work with the official python driver shipped with 10.5. Nevertheless it should be easy to adopt to that, so I'll add the code anyhow:
#!/usr/bin/python
import sys
import getopt
import DB2
def main():
try:
opts, args = getopt.getopt(sys.argv[1:], "d:t:s:u:p:")
except getopt.GetoptError:
sys.exit(-1)
for o, a in opts:
if (o == "-d"):
dbname = a
if (o == "-t"):
tables = a.split(',')
if (o == "-s"):
schema = a
if (o == "-u"):
user = a
if (o == "-p"):
pwd = a
conn = DB2.connect(dsn=dbname, uid=user, pwd=pwd)
fk_sql = """
select tabschema, tabname, constname, FK_COLNAMES,
reftabschema, reftabname, PK_COLNAMES,
case DELETERULE WHEN 'C' then 'CASCADE'
WHEN 'R' then 'RESTRICT'
ELSE 'NO ACTION' end,
case UPDATERULE WHEN 'C' then 'CASCADE'
WHEN 'R' then 'RESTRICT'
ELSE 'NO ACTION' end
from syscat.references
where reftabname = ? and reftabschema = ?"""
c1 = conn.cursor()
c2 = conn.cursor()
create_stmts = []
drop_stmts = []
for t in tables:
c1.execute(fk_sql, (t,schema))
restore_sql = ''
for row in c1.fetchall():
tabscema = row[0]
tabname = row[1]
constname = row[2]
FK_COLNAMES = row[3]
reftabschema= row[4]
reftabname = row[5]
PK_COLNAMES = row[6]
DELETERULE = row[7]
UPDATERULE = row[8]
fk = filter ((lambda x:x<>''), FK_COLNAMES.split(' '))
cols = ''
for c in fk: cols = cols + ',' + c
fkcols = cols[1:]
pk = filter ((lambda x:x<>''), PK_COLNAMES.split(' '))
cols = ''
for c in pk: cols = cols + ',' + c
pkcols = cols[1:]
create = """
alter table %s.%s add constraint %s
foreign key (%s)
references %s.%s (%s)
on update %s
on delete %s;
""" % (tabscema,tabname,constname,fkcols,reftabschema, reftabname,pkcols,UPDATERULE,DELETERULE)
drop = """
alter table %s.%s drop constraint %s;
""" % (tabscema,tabname,constname)
create_stmts.append(create)
drop_stmts.append(drop)
conn.rollback()
for x in drop_stmts:
print x
for x in create_stmts:
print x
if __name__ == "__main__":
main()