With partitioned table, How can I use hints to group each partition separately

https://dba.stackexchange.com/questions/223140

16-01-2021
|

Question

Suppose I have the following table
data(partitioned_key_index, some_dummy_measure)

Assume that partitions are of an equal large size. with Oracle 11g.

The end result should be like this

select partitioned_key_index, sum(some_dummy_measure) 
  from data group by partitioned_key_index

Each partition will be grouped independently, the optimizer should be
clever enough to come up with a plan in which each partition will be aggregated then a simple 'union all' to get the desired output.

What I want to do is something close to this

 select 1 as partitioned_key_index, sum(some_dummy_measure) 
  from data where partitioned_key_index = 1  
 Union All
 select 2 as partitioned_key_index, sum(some_dummy_measure) 
  from data where partitioned_key_index = 2
 Union All
  .
  .
  .
 select i as partitioned_key_index, sum(some_dummy_measure) 
  from data where partitioned_key_index = i

My intuition with the above method is to to serialize the hash group operation
thus each partition will be moved from the disk to the buffer cache with the hope of not spilling into disk for the group by operation.

Any ideas how to tune this kind of queries?

Solution

The database does this by default, no hints/tuning needed.

create table data(partitioned_key_index number, some_dummy_measure number)
  partition by list (partitioned_key_index)
(
     partition p1 values (1),
     partition p2 values (2),
     partition p3 values (3),
     partition p4 values (4),
     partition p5 values (5)
);

insert into data with g as (select * from dual connect by level <= 1000) 
select mod(rownum, 5) + 1, rownum from g,g where rownum <= 500000;
commit;

Then run the query:

alter session set statistics_level=all;

select partitioned_key_index, sum(some_dummy_measure) from data 
  group by partitioned_key_index;


PARTITIONED_KEY_INDEX SUM(SOME_DUMMY_MEASURE)
--------------------- -----------------------
                    1             25000250000
                    2             24999850000
                    3             24999950000
                    4             25000050000
                    5             25000150000

Check what happened:

SQL> select * from table(dbms_xplan.display_cursor(format=>'allstats last'));

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------
SQL_ID  537qapda9hdy4, child number 0
-------------------------------------
select partitioned_key_index, sum(some_dummy_measure) from data group
by partitioned_key_index

Plan hash value: 3405952922

-----------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      5 |00:00:00.14 |    1065 |       |       |          |
|   1 |  PARTITION LIST ALL |      |      1 |    585K|      5 |00:00:00.14 |    1065 |       |       |          |
|   2 |   HASH GROUP BY     |      |      5 |    585K|      5 |00:00:00.14 |    1065 |    34M|  6473K|  738K (0)|
|   3 |    TABLE ACCESS FULL| DATA |      5 |    585K|    500K|00:00:00.06 |    1065 |       |       |          |
-----------------------------------------------------------------------------------------------------------------

Note
-----
   - dynamic sampling used for this statement (level=2)

Operations below PARTITION LIST ALL were done for all partitions. We have 5 partitions, including the GROUP BY. As you can see from the Starts column, HASH GROUP BY was really performed 5 times.

This is how it looks like, when GROUP BY is performed for the whole table at once:

select (partitioned_key_index + 0), sum(some_dummy_measure) from data
    group by (partitioned_key_index + 0);

(PARTITIONED_KEY_INDEX+0) SUM(SOME_DUMMY_MEASURE)
------------------------- -----------------------
                        1             25000250000
                        2             24999850000
                        5             25000150000
                        4             25000050000
                        3             24999950000

SQL> select * from table(dbms_xplan.display_cursor(format=>'allstats last'));

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  2xxf42mtp53sc, child number 0
-------------------------------------
select (partitioned_key_index + 0), sum(some_dummy_measure) from data
group by (partitioned_key_index + 0)

Plan hash value: 3651737839

-----------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      5 |00:00:00.24 |    1065 |       |       |          |
|   1 |  HASH GROUP BY      |      |      1 |    585K|      5 |00:00:00.24 |    1065 |    34M|  6473K| 4574K (0)|
|   2 |   PARTITION LIST ALL|      |      1 |    585K|    500K|00:00:00.14 |    1065 |       |       |          |
|   3 |    TABLE ACCESS FULL| DATA |      5 |    585K|    500K|00:00:00.06 |    1065 |       |       |          |
-----------------------------------------------------------------------------------------------------------------

Note
-----
   - dynamic sampling used for this statement (level=2)


20 rows selected.

HASH GROUP BY was performend only once, for the whole amount of data, after it was collected from all partitions.

OTHER TIPS

Your intuition is correct. So far Oracle can not iterate over partitions and have a different exec plan for each of them. The only exception is new 12c feature, where a local index can be missing on some partitions.

General recommendation is that each partition should have similar statistical characteristic - or you will face problems with exec plan stability.

You can you use union as you did or in same cases 2 level partitioning(plus row movement) might help. On 1st level distinguish data as live vs. processed. On 2nd level partition them by time.

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange