Basic index compression - identify candidates and estimate prefix length
Goal:
Identify candidates for index compression quickly using statistics before running
ANALYZE with VALIDATE STRUCTURE
ANALYZE with VALIDATE STRUCTURE
This code contains heaps of assumptions and estimates, and has plenty of room for improvement.
Estimates are use: table, column and index statistics ( distinct_keys, leaf_blocks, num_rows, avg_col_len)
Estimates are use: table, column and index statistics ( distinct_keys, leaf_blocks, num_rows, avg_col_len)
The last column gives an estimated savings in MB - a negative value means the index is expected to grow in size given the new prefix length.
Remember:
- The data stored in the index is not actually compressed, duplicate values are removed.
- A smaller index is good for memory as well as disk.
- If there is empty space above pctfree, the index will shrink even further than estimated when it is rebuilt with compression. You will also get this from validate structure.
- Read Richard Foote's blog for a more detailed explanation of how (and when) index compression works.
select a.owner, index_name, table_name, leaf_blocks, cols,
num_cols index_cols, distinct_keys , idx_rows,
prefix_length new_prefix_length,
prefix_distinct_keys estd_prefix_distinct_keys ,
round((
/* space freed per row */
( prefix_byte_length * idx_rows )
/* cost of prefix entries in leaf blocks */
- ( (prefix_byte_length + 5) * prefix_records ) /* 5ish extra bytes for each prefix entry */
)/1024/1024 ,2) estd_savings_MB
from (
select a.*, round( idx_rows/prefix_distinct_keys ,2) prefix_rows_per_key , avg_leaf_blocks_per_key*prefix_distinct_keys prefix_records
from(
select /*+ ordered */ a.owner, a.index_name, a.table_name , a.leaf_blocks,
a.distinct_keys , a.avg_leaf_blocks_per_key, a.num_rows idx_rows
,pl prefix_length, count(*) num_cols
,listagg( b.column_name,', ') within group ( order by b.column_position ) cols
,listagg( num_distinct,', ') within group ( order by b.column_position ) num_distinct
,case
when pl = bb.num_rows then max( a.distinct_keys )
when pl = 1 then max( num_distinct )
else least( max( num_distinct ), round(exp(sum(ln( decode( num_distinct,0,1,num_distinct) ))),0)*0.5 )
end prefix_distinct_keys
,sum( avg_col_len ) prefix_byte_length
from dba_indexes a , dba_ind_columns b , dba_tab_cols c ,
( select level pl from dual connect by rownum <= 5) d ,
( select count(*) num_rows , bb.index_name, bb.index_owner from dba_ind_columns bb group by bb.index_name, bb.index_owner ) bb
where a.owner = '&SCHEMA.' -- schema to analyze
and a.prefix_length is null --- not already compressed
and a.leaf_blocks > 2 -- big enough to benefit
and index_type = 'NORMAL'
and bb.index_name = a.index_name and a.owner = bb.index_owner
and a.index_name = b.index_name and a.owner = b.index_owner
and c.table_name = a.table_name and a.table_owner = c.owner
and b.column_name = c.column_name
and pl >= b.column_position
and pl <= bb.num_rows -- ( select count(*) from dba_ind_columns bb where bb.index_name = a.index_name and a.owner = bb.index_owner )
group by a.owner, a.index_name, a.table_name , a.leaf_blocks,
a.distinct_keys , a.avg_leaf_blocks_per_key, a.num_rows ,pl, bb.num_rows
) a ) a
order by estd_savings_MB desc ;
Comments
Post a Comment