Determine Index Compression Candidates Online

https://dba.stackexchange.com/questions/3469

16-10-2019
|

Question

By doing the following I can determine if an index will benefit from compression and how many columns should be included in the compression:

ANALYZE INDEX Owner.IndexName VALIDATE STRUCTURE OFFLINE;
SELECT Opt_Cmpr_PctSave, Opt_Cmpr_Count FROM Index_Stats;

The problem is that when OFFLINE is changed to ONLINE the Index_Stats view does not get populated. Is there an online way to determine the benefit of compressing an index and/or the number of columns that will produce optimal compression?

Update:

http://jonathanlewis.wordpress.com/index-definitions/ indicates that if Distinct_Keys from DBA_Indexes is "a lot smaller" than num_rows then the index is a good candidate for compression. This helps some, but isn't definitive and doesn't help determine the number of columns. He does give some guidelines for that, but nothing that can be determined programatically without a bunch of dynamic SQL.

Solution

The optimal number of columns to compress depends on:

The number of entries that will fit in each block (this depends on the number of compressed columns as they are only stored once per block)
The average number of entries with same prefix

These factors can be estimated for the table

The aim is to maximize the size of the compressed prefix whilst minimizing the number of blocks needed to hold all rows with the same prefix.

Assuming that the data is uniform at least to a degree, and ignoring the small amount of overhead compression introduces, you could attempt to implement this approach like this:

helper functions:

create or replace function f_size( p_table_name in varchar, 
                                   p_column_name in varchar) 
                  return number as
  n number;
begin
  execute immediate 
    'select avg(vsize('||p_column_name||'))+1 from '||p_table_name into n;
  return n;
end;
/

create or replace function f_count( p_table_name in varchar, 
                                    p_column_names in varchar ) 
                  return integer as
  n integer;
begin
  execute immediate 'select count(*) '||
                    'from ( select '|| p_column_names || 
                           ' from '||p_table_name||' '||
                           'group by '||p_column_names||' )' 
          into n;
  return n;
end;
/

test IOT:

create table t ( k1, k2, k3, k4, k5, val, 
                 constraint pk_t primary key(k1, k2, k3, k4, k5)) 
       organization index as
select mod(k,10)||'_____', 
       mod(k,20)||'_____', 
       mod(k,30)||'_____', 
       mod(k,50)||'_____', 
       k||'_____', 
       lpad(' ',100)
from (select level as k from dual connect by level<=1000);

query:

with utc as (select table_name, column_name, f_size(table_name, column_name) as column_size from user_tab_columns where table_name='T'),
     uic as (select table_name, column_name, column_position, column_size from user_ind_columns join utc using(table_name, column_name) where index_name='PK_T')
select z.*, (8192-prefix_size*prefixes_per_block)/remaining_size as rows_per_block
from( select z.*, greatest(1,8192/(prefix_size+rows_per_prefix*remaining_size)) as prefixes_per_block
      from( select z.*, total_count/distinct_count as rows_per_prefix
            from( select prefix_length, sum(column_size) as prefix_size, (select sum(column_size) from utc)-sum(column_size) as remaining_size, f_count(table_name, max(prefix_columns)) as distinct_count, 
                         (select count(*) from t) as total_count
                  from( select table_name, connect_by_root column_position as prefix_length, column_size, substr(sys_connect_by_path(column_name, ','),2) as prefix_columns
                        from uic
                        connect by column_position=(prior column_position-1) )
                  group by table_name, prefix_length ) z ) z ) z
order by 1;

result:

PREFIX_LENGTH          PREFIX_SIZE            REMAINING_SIZE         DISTINCT_COUNT         TOTAL_COUNT            ROWS_PER_PREFIX        PREFIXES_PER_BLOCK     ROWS_PER_BLOCK         
---------------------- ---------------------- ---------------------- ---------------------- ---------------------- ---------------------- ---------------------- ---------------------- 
1                      7                      132.854                10                     1000                   100                    1                      61.608 
2                      14.5                   125.354                20                     1000                   50                     1.304                  65.200 
3                      22.161                 117.693                60                     1000                   16.666                 4.129                  68.827 
4                      29.961                 109.893                300                    1000                   3.333                  20.672                 68.909 
5                      38.854                 101                    1000                   1000                   1                      58.575                 58.575

check:

analyze index pk_t validate structure;
select opt_cmpr_pctsave, opt_cmpr_count from index_stats;

OPT_CMPR_PCTSAVE       OPT_CMPR_COUNT         
---------------------- ---------------------- 
13                     3

The check above roughly corresponds with the prefix length with the maximum rows_per_block in the calculation - but I suggest you check my working carefully for yourself before trusting it :)

I am assuming the table is so large that you can't just take a copy and try out different prefix lengths. Another approach would be to do just that on a sample of the data - the sample should be chosen as a random selection of prefixes for a given compression candidate (rather than just a random selection of rows)

OTHER TIPS

This is the issue:

http://download.oracle.com/docs/cd/E11882_01/server.112/e17118/statements_4005.htm#SQLRF53681

When it is done ONLINE, no stats are gathered, so you don't get your answer.

I'm afraid you are going to have to do it during a quiet period.

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange