With partitioned table, How can I use hints to group each partition separately
-
16-01-2021 - |
Question
Suppose I have the following table
data(partitioned_key_index, some_dummy_measure)
Assume that partitions are of an equal large size. with Oracle 11g.
The end result should be like this
select partitioned_key_index, sum(some_dummy_measure)
from data group by partitioned_key_index
Each partition will be grouped independently, the optimizer should be
clever enough to come up with a plan in which each partition will be aggregated then a simple 'union all' to get the desired output.
What I want to do is something close to this
select 1 as partitioned_key_index, sum(some_dummy_measure)
from data where partitioned_key_index = 1
Union All
select 2 as partitioned_key_index, sum(some_dummy_measure)
from data where partitioned_key_index = 2
Union All
.
.
.
select i as partitioned_key_index, sum(some_dummy_measure)
from data where partitioned_key_index = i
My intuition with the above method is to to serialize the hash group operation
thus each partition will be moved from the disk to the buffer cache with the hope of not spilling into disk for the group by operation.
Any ideas how to tune this kind of queries?
Solution
The database does this by default, no hints/tuning needed.
create table data(partitioned_key_index number, some_dummy_measure number)
partition by list (partitioned_key_index)
(
partition p1 values (1),
partition p2 values (2),
partition p3 values (3),
partition p4 values (4),
partition p5 values (5)
);
insert into data with g as (select * from dual connect by level <= 1000)
select mod(rownum, 5) + 1, rownum from g,g where rownum <= 500000;
commit;
Then run the query:
alter session set statistics_level=all;
select partitioned_key_index, sum(some_dummy_measure) from data
group by partitioned_key_index;
PARTITIONED_KEY_INDEX SUM(SOME_DUMMY_MEASURE)
--------------------- -----------------------
1 25000250000
2 24999850000
3 24999950000
4 25000050000
5 25000150000
Check what happened:
SQL> select * from table(dbms_xplan.display_cursor(format=>'allstats last'));
PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------
SQL_ID 537qapda9hdy4, child number 0
-------------------------------------
select partitioned_key_index, sum(some_dummy_measure) from data group
by partitioned_key_index
Plan hash value: 3405952922
-----------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 5 |00:00:00.14 | 1065 | | | |
| 1 | PARTITION LIST ALL | | 1 | 585K| 5 |00:00:00.14 | 1065 | | | |
| 2 | HASH GROUP BY | | 5 | 585K| 5 |00:00:00.14 | 1065 | 34M| 6473K| 738K (0)|
| 3 | TABLE ACCESS FULL| DATA | 5 | 585K| 500K|00:00:00.06 | 1065 | | | |
-----------------------------------------------------------------------------------------------------------------
Note
-----
- dynamic sampling used for this statement (level=2)
Operations below PARTITION LIST ALL
were done for all partitions. We have 5 partitions, including the GROUP BY
. As you can see from the Starts
column, HASH GROUP BY
was really performed 5 times.
This is how it looks like, when GROUP BY
is performed for the whole table at once:
select (partitioned_key_index + 0), sum(some_dummy_measure) from data
group by (partitioned_key_index + 0);
(PARTITIONED_KEY_INDEX+0) SUM(SOME_DUMMY_MEASURE)
------------------------- -----------------------
1 25000250000
2 24999850000
5 25000150000
4 25000050000
3 24999950000
SQL> select * from table(dbms_xplan.display_cursor(format=>'allstats last'));
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 2xxf42mtp53sc, child number 0
-------------------------------------
select (partitioned_key_index + 0), sum(some_dummy_measure) from data
group by (partitioned_key_index + 0)
Plan hash value: 3651737839
-----------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 5 |00:00:00.24 | 1065 | | | |
| 1 | HASH GROUP BY | | 1 | 585K| 5 |00:00:00.24 | 1065 | 34M| 6473K| 4574K (0)|
| 2 | PARTITION LIST ALL| | 1 | 585K| 500K|00:00:00.14 | 1065 | | | |
| 3 | TABLE ACCESS FULL| DATA | 5 | 585K| 500K|00:00:00.06 | 1065 | | | |
-----------------------------------------------------------------------------------------------------------------
Note
-----
- dynamic sampling used for this statement (level=2)
20 rows selected.
HASH GROUP BY
was performend only once, for the whole amount of data, after it was collected from all partitions.
OTHER TIPS
Your intuition is correct. So far Oracle can not iterate over partitions and have a different exec plan for each of them. The only exception is new 12c feature, where a local index can be missing on some partitions.
General recommendation is that each partition should have similar statistical characteristic - or you will face problems with exec plan stability.
You can you use union as you did or in same cases 2 level partitioning(plus row movement) might help. On 1st level distinguish data as live vs. processed. On 2nd level partition them by time.