Filter by date range based on PK id.
Understanding how data gets into a table means you have opportunities
to optimize the way a query is executed beyond what is available to the
optimizer.
In this scenario, I have an audit table where the primary
key is a sequence, and represents the only index on a table with years of data.
The table also has a date column that represents when a
row was inserted.
I receive a request for an extract of changes for a short
timeframe ( say, last Tuesday)
If I know for sure that the date column and pk are both
ordered the same (more or less), I can use the PK to filter by date allowing a small index
range scan to bring back the relevant rows instead of a FTS.
Prerequisites:
- I know the sequence is inserted in time order (more or less).
- There aren’t many gaps in the table.
select * from owner.audittab
where aud_date between '22-AUG-2017 09:00:00'
and '22-AUG-2017 13:00:00';
----------------------------------------------------------------------
| Id |
Operation | Name | Rows
| Bytes | Cost (%CPU)|
----------------------------------------------------------------------
| 0 | SELECT
STATEMENT | |
3939 | 423K| 184K
(1)|
| 1 | TABLE ACCESS FULL| AUDITTAB | 3939
| 423K| 184K
(1)|
----------------------------------------------------------------------
Get the high value for the PK:
select max(id) maxd from
owner.audittab;
MAX_ID
----------
47,634,622
Get a distributed list of mapped id to date values (5000)
values.
select level l, round ((max_id/num_groups)*(level-1),0)
id
from (select max(aud_id) max_id, 1000 num_groups from
owner.audittab) a
connect by level <= num_groups ;
L ID
---------- ----------
1 1
2 51402
3 102803
4 154204
5 205606
6 257007
...
...
...
...
996 51144147
997 51195548
998 51246950
999 51298351
1000 51349752
Narrow down the last ID before your date range, and the
first one after it.
select
max( case when auddate < '22-AUG-2017 09:00:00'
then id end ) start_id ,
min( case when auddate > '22-AUG-2017 13:00:00'
then id end ) end_id
from (
select
a.id, aud_date auddate
from(
select
level l , round ((max_id/num_groups)*(level-1),0)+1 id
from (
select max(aud_id) max_id , 1000 num_groups from owner.audittab ) a
connect by
level <= num_groups
) a , owner.audittab b
where a.id = b.aud_id
);
----------------------------------------------------------------------------------------
| Id |
Operation |
Name | Rows | Bytes | Cost (%CPU)|
----------------------------------------------------------------------------------------
| 0 | SELECT
STATEMENT | | 1 |
27 | 5 (0)|
| 1 | SORT AGGREGATE | | 1 |
27 | |
| 2 | NESTED LOOPS | | 1 |
27 | 5 (0)|
| 3 | NESTED LOOPS | | 1 |
27 | 5 (0)|
| 4 | VIEW | | 1 |
13 | 3 (0)|
| 5 | CONNECT BY WITHOUT FILTERING| | |
| |
| 6 | VIEW | | 1 |
16 | 3 (0)|
| 7 | SORT AGGREGATE | | 1 |
6 | |
| 8 | INDEX FULL SCAN (MIN/MAX)|
PK_AUDITTAB | 1 | 6 |
3 (0)|
| 9 | INDEX UNIQUE SCAN | PK_AUDITTAB | 1 |
| 1 (0)|
| 10 | TABLE ACCESS BY INDEX ROWID | SECURITYLOG |
1 | 14 | 2
(0)|
----------------------------------------------------------------------------------------
START_ID
END_ID
------------ ------------
51,092,941 51,195,744
If end_id comes back null , use the max_id from above (47,634,622)
Use the selected ID range to filter before refining by the
requested date range.
select * from owner.audittab
where aud_id between
51092941 and 51195744
and aud_date between '22-AUG-2017 09:00:00' and '22-AUG-2017 13:00:00';
-----------------------------------------------------------------------------------
| Id |
Operation | Name | Rows | Bytes | Cost (%CPU)|
-----------------------------------------------------------------------------------
| 0 | SELECT
STATEMENT | | 9 |
990 | 2346 (1)|
| 1 | TABLE ACCESS BY INDEX ROWID| AUDITTAB |
9 | 990 | 2346
(1)|
| 2 | INDEX RANGE SCAN | PK_AUDITTAB |
102K| | 397
(1)|
-----------------------------------------------------------------------------------
This method can be used in applications that take a date
range as parameters – to allow searching without adding an extra index.
Comments
Post a Comment