Basic index compression - identify candidates and estimate prefix length


Goal: 

Identify candidates for index compression quickly using statistics before running
ANALYZE with VALIDATE STRUCTURE

Loosely based on:

This code contains heaps of assumptions and estimates, and has plenty of room for improvement.
Estimates are use: table, column and index statistics ( distinct_keys, leaf_blocks, num_rows, avg_col_len)

The last column gives an estimated savings in MB - a negative value means the index is expected to grow in size given the new prefix length.

Remember: 
  • The data stored in the index is not actually compressed, duplicate values are removed. 
  • A smaller index is good for memory as well as disk. 
  • If there is empty space above pctfree, the index will shrink even further than estimated when it is rebuilt with compression. You will also get this from validate structure. 
  • Read Richard Foote's blog for a more detailed explanation of how (and when) index compression works. 

select a.owner, index_name, table_name, leaf_blocks, cols, 
num_cols index_cols, distinct_keys , idx_rows, 
prefix_length new_prefix_length, 
prefix_distinct_keys estd_prefix_distinct_keys ,
round(( 
  /* space freed per row */ 
  ( prefix_byte_length * idx_rows ) 
  /* cost of prefix entries in leaf blocks  */ 
  - ( (prefix_byte_length + 5) * prefix_records ) /* 5ish extra bytes for each prefix entry */ 
)/1024/1024 ,2) estd_savings_MB
from (
select a.*, round( idx_rows/prefix_distinct_keys ,2) prefix_rows_per_key ,  avg_leaf_blocks_per_key*prefix_distinct_keys  prefix_records 
from( 
select /*+ ordered */  a.owner, a.index_name, a.table_name , a.leaf_blocks,
a.distinct_keys , a.avg_leaf_blocks_per_key, a.num_rows idx_rows 
,pl prefix_length, count(*) num_cols 
,listagg( b.column_name,', ') within group ( order by b.column_position  ) cols 
,listagg(  num_distinct,', ') within group ( order by b.column_position  ) num_distinct 
,case 
    when pl = bb.num_rows then max(  a.distinct_keys )
    when pl = 1 then  max( num_distinct  )
    else least(  max( num_distinct  ),  round(exp(sum(ln( decode( num_distinct,0,1,num_distinct) ))),0)*0.5 )
 end  prefix_distinct_keys 
,sum( avg_col_len ) prefix_byte_length 
from dba_indexes a , dba_ind_columns b , dba_tab_cols c , 
( select level pl from dual connect by rownum <= 5) d ,
( select count(*) num_rows ,  bb.index_name,  bb.index_owner  from dba_ind_columns bb group by bb.index_name,  bb.index_owner   ) bb  
where a.owner = '&SCHEMA.'  -- schema to analyze 
and a.prefix_length is null  --- not already compressed 
and a.leaf_blocks > 2  -- big enough to benefit  
and index_type = 'NORMAL' 
and bb.index_name = a.index_name and a.owner = bb.index_owner  
and a.index_name = b.index_name and a.owner = b.index_owner 
and c.table_name = a.table_name  and a.table_owner = c.owner 
and b.column_name = c.column_name 
and pl >=  b.column_position 
and pl  <= bb.num_rows -- ( select count(*) from dba_ind_columns bb where bb.index_name = a.index_name and a.owner = bb.index_owner ) 
group by  a.owner, a.index_name, a.table_name , a.leaf_blocks,
a.distinct_keys , a.avg_leaf_blocks_per_key, a.num_rows  ,pl, bb.num_rows
) a ) a 
order by estd_savings_MB  desc ;

Comments