Oracle SQL strategy for slow parameterized/filtered queries due to static optimizer strategy

https://stackoverflow.com/questions/22379074

14-06-2023
|

Question

More simply put than the below: if one has one or multiple query parameters, e.g. x_id, (or report / table function parameters) that are performance crucial (e.g. some primary key index can be used) and it may be (depending on the use case/report filters applied, ...) one of

null
an exact match (e.g. some unique id)
a like expression
or even a regexp expression

then if all these possibilities are coded in a single query, I only see and know that the optimizer will

generate a unique static plan, independent of the actual parameter runtime-value
and thus can't assume to use some index on x_id although it may be e.g. some exact match

Are there ather ways to handle this than to

let some PL/SQL code choose out of n predefined and use case optimized queries/views?
- which can be quite large the more such flexible parameters one has
or some manually string-constructed and dynamically compiled query?

Basically I have two slightly different use cases/questions as documented and executable below:

A - select * from tf_sel

B - select * from data_union

which could potentially be solved via SQL hints or using some other trick.

To speed these queries up I am currently separating the "merged queries" on a certain implementation level (table function) which is quite cumbersome and harder to maintain, but assures the queries are running quite fast due their better execution plan.

As I see it, the main problem seems the static nature of the optimizer sql plan that is always the same altough it could be much more efficient, if it would consider some "query-time-constant" filter parameters.

with
    -- Question A: What would be a good strategy to make tf_sel with tf_params nearly as fast as query_use_case_1_eq
    --             which actually provides the same result?
    --
    -- - a complex query should be used in various reports with filters
    -- - we want to keep as much as possible filter functionality on the db side (not the report engine side)
    --   to be able to utilize the fast and efficient db engine and for loosely coupled software design


    complex_query as (  -- just some imaginable complex query with a lot of table/view joins, aggregation/analytical functions etc.
        select 1 as id, 'ab12' as indexed_val, 'asdfasdf' x from dual
        union all select 2, 'ab34', 'a uiop345' from dual
        union all select 3, 'xy34', 'asdf  0u0duaf' from dual
        union all select 4, 'xy55', ' asdja´sf asd' from dual
    )


-- <<< comment the following lines in to test it with the above

--  , query_use_case_1_eq as (  -- quite fast and maybe the 95% use case
--      select * from complex_query where indexed_val = 'ab12'
--  )
--select * from query_use_case_1_eq 

-- >>>

-- ID INDEXED_VAL X
-- -- ----------- --------
--  1 ab12        asdfasdf


-- <<< comment the following lines in to test it with the above

--  , query_use_case_2_all as (  -- significantly slower due to a lot of underlying calculations
--      select * from complex_query
--  )
--select * from query_use_case_2_all

-- >>>

-- ID INDEXED_VAL X
-- -- ----------- -------------
--  1 ab12        asdfasdf
--  2 ab34        a uiop345
--  3 xy34        asdf  0u0duaf
--  4 xy55         asdja´sf asd


-- <<< comment the following lines in to test it with the above

--  , query_use_case_3_like as (
--      select * from complex_query where indexed_val like 'ab%'
--  )
--select * from query_use_case_3_like

-- >>>

-- ID INDEXED_VAL X
-- -- ----------- ---------
--  1 ab12        asdfasdf
--  2 ab34        a uiop345


-- <<< comment the following lines to simulate the table function

    , tf_params as (  -- table function params: imagine we have a table function where these are passed depending on the report
        select  'ab12' p_indexed_val,  'eq' p_filter_type  from dual
    )
    , tf_sel as (  -- table function select: nicely integrating all query possiblities, but beeing veeery slow :-(
        select q.* 
        from 
            tf_params p  -- just here so this example works without the need for the actual function
            join complex_query q on (1=1)
        where
                p_filter_type = 'all'
            or (p_filter_type = 'eq' and indexed_val = p_indexed_val)
            or (p_filter_type = 'like' and indexed_val like p_indexed_val)
            or (p_filter_type = 'regexp' and regexp_like(indexed_val, p_indexed_val))
    )

-- actually we would pass the tf_params above if it were a real table function
select * from tf_sel

-- >>>

-- ID INDEXED_VAL X
-- -- ----------- --------
--  1 ab12        asdfasdf



-- Question B: How can we speed up data_union with dg_filter to be as fast as the data_group1 query which
--             actually provides the same result? 
-- 
-- A very similar approach is considered in other scenarios where we like to join the results of 
-- different queries (>5) returning joinable data and beeing filtered based on the same parameters.

-- <<< comment the following lines to simulate the union problem

--  , data_group1 as (  -- may run quite fast
--      select 'dg1' dg_id, q.* from complex_query q where x < 'a'  -- just an example returning some special rows that should be filtered later on!
--  )
--  
--  , data_group2 as (  -- may run quite fast
--      select 'dg2' dg_id, q.* from complex_query q where instr(x,'p') >= 0  -- just an example returning some special rows that should be filtered later on!
--  )   
--  
--  
--  , dg_filter as (  -- may be set by a report or indirectly by user filters
--      select  'dg1' dg_id  from dual
--  )
--  
--  , data_union as (  -- runs much slower due to another execution plan
--      select * from (
--          select * from data_group1 
--          union all select * from data_group2
--      )
--      where dg_id in (select dg_id from dg_filter)
--  )
--
--select * from data_union

-- >>>

-- DG_ID ID INDEXED_VAL X
-- ----- -- ----------- -------------
-- dg1    4 xy55         asdja´sf asd

this is a comment to the sample code and answer provided by jonearles

Actually your answer was a mix up of my (unrelated although occuring together in certain scenarios) use cases A and B. Although it's nevertheless essential that you mentioned the optimizer has dynamic FILTER and maybe other capabilities.

use case B ("data partition/group union")

Actually use case B (based on your sample table) looks more like this, but I still have to check for the performance issue in the real scenario. Maybe you can see some problems with it already?

select * from (
    select 'dg1' data_group, x.* from sample_table x 
        where mod(to_number(some_other_column1), 100000) = 0  -- just some example restriction
            --and indexed_val = '3635'  -- commenting this in and executing this standalone returns:
            ----------------------------------------------------------------------------------------
            --| Id  | Operation                   | Name              | Rows  | Bytes | Cost (%CPU)|
            ----------------------------------------------------------------------------------------
            --|   0 | SELECT STATEMENT            |                   |     1 |    23 |     2   (0)|
            --|   1 |  TABLE ACCESS BY INDEX ROWID| SAMPLE_TABLE      |     1 |    23 |     2   (0)|
            --|   2 |   INDEX RANGE SCAN          | SAMPLE_TABLE_IDX1 |     1 |       |     1   (0)|
            ----------------------------------------------------------------------------------------            
    union all
    select 'dg2', x.* from sample_table x
        where mod(to_number(some_other_column2), 9999) = 0  -- just some example restriction
    union all
    select 'dg3', x.* from sample_table x
        where mod(to_number(some_other_column3), 3635) = 0  -- just some example restriction
)
where data_group in ('dg1') and indexed_val = '35'

-------------------------------------------------------------------------------------------
--| Id  | Operation                      | Name              | Rows  | Bytes | Cost (%CPU)|
-------------------------------------------------------------------------------------------
--|   0 | SELECT STATEMENT               |                   |     3 |   639 |     2   (0)|
--|   1 |  VIEW                          |                   |     3 |   639 |     2   (0)|
--|   2 |   UNION-ALL                    |                   |       |       |            |
--|   3 |    TABLE ACCESS BY INDEX ROWID | SAMPLE_TABLE      |     1 |    23 |     2   (0)|
--|   4 |     INDEX RANGE SCAN           | SAMPLE_TABLE_IDX1 |     1 |       |     1   (0)|
--|   5 |    FILTER                      |                   |       |       |            |
--|   6 |     TABLE ACCESS BY INDEX ROWID| SAMPLE_TABLE      |     1 |    23 |     2   (0)|
--|   7 |      INDEX RANGE SCAN          | SAMPLE_TABLE_IDX1 |     1 |       |     1   (0)|
--|   8 |    FILTER                      |                   |       |       |            |
--|   9 |     TABLE ACCESS BY INDEX ROWID| SAMPLE_TABLE      |     1 |    23 |     2   (0)|
--|  10 |      INDEX RANGE SCAN          | SAMPLE_TABLE_IDX1 |     1 |       |     1   (0)|
-------------------------------------------------------------------------------------------

use case A (filtering by column query type)

Based on your sample table this is more like what I wanna do. As you can see the query with just the fast where p.ft_id = 'eq' and x.indexed_val = p.val shows the index usage, but having all the different filter options in the where clause will cause the plan switch to always use a full table scan :-/ (Even if I use the :p_filter_type and :p_indexed_val_filter everywhere in the SQL than just in the one spot I put it, it won't change.)

with 
    filter_type as (
        select 'all' as id from dual
        union all select 'eq' as id from dual
        union all select 'like' as id from dual
        union all select 'regexp' as id from dual
    )
    , params as (
        select 
            (select * from filter_type where id = :p_filter_type) as ft_id,
            :p_indexed_val_filter as val
        from dual
    )
select * 
from params p
    join sample_table x on (1=1)
    -- the following with the above would show the 'eq' use case with a fast index scan (plan id 14/15)
    --where p.ft_id = 'eq' and x.indexed_val = p.val
    ------------------------------------------------------------------------------------------
    --| Id  | Operation                     | Name              | Rows  | Bytes | Cost (%CPU)|
    ------------------------------------------------------------------------------------------
    --|   0 | SELECT STATEMENT              |                   |     1 |    23 |    12   (0)|
    --|   1 |  VIEW                         |                   |     4 |    20 |     8   (0)|
    --|   2 |   UNION-ALL                   |                   |       |       |            |
    --|   3 |    FILTER                     |                   |       |       |            |
    --|   4 |     FAST DUAL                 |                   |     1 |       |     2   (0)|
    --|   5 |    FILTER                     |                   |       |       |            |
    --|   6 |     FAST DUAL                 |                   |     1 |       |     2   (0)|
    --|   7 |    FILTER                     |                   |       |       |            |
    --|   8 |     FAST DUAL                 |                   |     1 |       |     2   (0)|
    --|   9 |    FILTER                     |                   |       |       |            |
    --|  10 |     FAST DUAL                 |                   |     1 |       |     2   (0)|
    --|  11 |  FILTER                       |                   |       |       |            |
    --|  12 |   NESTED LOOPS                |                   |     1 |    23 |     4   (0)|
    --|  13 |    FAST DUAL                  |                   |     1 |       |     2   (0)|
    --|  14 |    TABLE ACCESS BY INDEX ROWID| SAMPLE_TABLE      |     1 |    23 |     2   (0)|
    --|  15 |     INDEX RANGE SCAN          | SAMPLE_TABLE_IDX1 |     1 |       |     1   (0)|
    --|  16 |   VIEW                        |                   |     4 |    20 |     8   (0)|
    --|  17 |    UNION-ALL                  |                   |       |       |            |
    --|  18 |     FILTER                    |                   |       |       |            |
    --|  19 |      FAST DUAL                |                   |     1 |       |     2   (0)|
    --|  20 |     FILTER                    |                   |       |       |            |
    --|  21 |      FAST DUAL                |                   |     1 |       |     2   (0)|
    --|  22 |     FILTER                    |                   |       |       |            |
    --|  23 |      FAST DUAL                |                   |     1 |       |     2   (0)|
    --|  24 |     FILTER                    |                   |       |       |            |
    --|  25 |      FAST DUAL                |                   |     1 |       |     2   (0)|
    ------------------------------------------------------------------------------------------  
where 
    --mod(to_number(some_other_column1), 3000) = 0 and  -- just some example restriction
    (
            p.ft_id = 'all'
        or
            p.ft_id = 'eq' and x.indexed_val = p.val
        or 
            p.ft_id = 'like' and x.indexed_val like p.val
        or 
            p.ft_id = 'regexp' and regexp_like(x.indexed_val, p.val)
    )
-- with the full flexibility of the filter the plan shows a full table scan (plan id 13) :-(    
--------------------------------------------------------------------------
--| Id  | Operation          | Name         | Rows  | Bytes | Cost (%CPU)|
--------------------------------------------------------------------------
--|   0 | SELECT STATEMENT   |              |  1099 | 25277 |   115   (3)|
--|   1 |  VIEW              |              |     4 |    20 |     8   (0)|
--|   2 |   UNION-ALL        |              |       |       |            |
--|   3 |    FILTER          |              |       |       |            |
--|   4 |     FAST DUAL      |              |     1 |       |     2   (0)|
--|   5 |    FILTER          |              |       |       |            |
--|   6 |     FAST DUAL      |              |     1 |       |     2   (0)|
--|   7 |    FILTER          |              |       |       |            |
--|   8 |     FAST DUAL      |              |     1 |       |     2   (0)|
--|   9 |    FILTER          |              |       |       |            |
--|  10 |     FAST DUAL      |              |     1 |       |     2   (0)|
--|  11 |  NESTED LOOPS      |              |  1099 | 25277 |   115   (3)|
--|  12 |   FAST DUAL        |              |     1 |       |     2   (0)|
--|  13 |   TABLE ACCESS FULL| SAMPLE_TABLE |  1099 | 25277 |   113   (3)|
--|  14 |    VIEW            |              |     4 |    20 |     8   (0)|
--|  15 |     UNION-ALL      |              |       |       |            |
--|  16 |      FILTER        |              |       |       |            |
--|  17 |       FAST DUAL    |              |     1 |       |     2   (0)|
--|  18 |      FILTER        |              |       |       |            |
--|  19 |       FAST DUAL    |              |     1 |       |     2   (0)|
--|  20 |      FILTER        |              |       |       |            |
--|  21 |       FAST DUAL    |              |     1 |       |     2   (0)|
--|  22 |      FILTER        |              |       |       |            |
--|  23 |       FAST DUAL    |              |     1 |       |     2   (0)|
--------------------------------------------------------------------------

Solution 2

(more for situation A) but also applicable to B) in this way ...)

I am now using some hybrid approach (combination of the 1. and 2. points in my question) and actually quite like it, because it also provides good debugging and encapsulation possibilities and the optimizer does not have to deal at all with finding the best strategy based on basically logically separated queries in a bigger query, e.g. on internal FILTER rules, which may be good or at worst incredibly more inefficient:

using this in the report

select * 
from table(my_report_data_func_sql(
  :val1,
  :val1_filter_type,
  :val2
))

where the table function is defined like this

create or replace function my_report_data_func_sql(
  p_val1              integer   default 1234,
  p_val1_filter_type  varchar2  default 'eq',
  p_val2              varchar2  default null
) return varchar2 is

  query varchar2(4000) := ' 
    with params as (  -- *: default param
        select
            ''||p_val1||'' p_val1,                            -- eq*
            '''||p_val1_filter_type||''' p_val1_filter_type,  -- [eq, all*, like, regexp]
            '''||p_val2||''' p_val2                           -- null*
        from dual
    )
    select x.*
    from
        params p  -- workaround for standalone-sql-debugging using "with" statement above
        join my_report_data_base_view x on (1=1)
    where 1=1  -- ease of filter expression adding below
    '          
    -- #### FILTER CRITERIAS are appended here ####

    -- val1-filter
    ||case p_val1_filter_type
        when 'eq' then '
        and val1 = p_val1
    '   when 'like' then '
        and val1 like p_val1
    '   when 'regexp' then '
        and regexp_like(val1,  p_val1)
    '   else '' end  -- all

  ;
begin
  return query;
end;
;

and would produce the following by example:

select * 
from table(my_report_data_func_sql(
  1234,
  'eq',
  'someval2'
))

/*
        with params as (  -- *: default param
            select
                1          p_val1,              -- eq*
                'eq'       p_val1_filter_type,  -- [eq, all*, like, regexp]
                'someval2' p_val2               -- null*
            from dual
        )
        select x.*
        from
            params p  -- workaround for standalone-sql-debugging using "with" statement above
            join my_report_data_base_view x on (1=1)
        where 1=1  -- ease of filter expression adding below
            and val1 = p_val1
*/

OTHER TIPS

Several features enable the optimizer to produce dynamic plans. The most common feature is FILTER operations, which should not be confused with filter predicates. A FILTER operation allows Oracle to enable or disable part of the plan at runtime based on a dynamic value. This feature normally works with bind variables, other types of dynamic queries may not use it.

Sample schema

create table sample_table
(
    indexed_val        varchar2(100),
    some_other_column1 varchar2(100),
    some_other_column2 varchar2(100),
    some_other_column3 varchar2(100)
);

insert into sample_table
select level, level, level, level
from dual
connect by level <= 100000;

create index sample_table_idx1 on sample_table(indexed_val);

begin
    dbms_stats.gather_table_stats(user, 'sample_table');
end;
/

Sample query using bind variables

explain plan for
select * from sample_table where :p_filter_type = 'all'
union all
select * from sample_table where :p_filter_type = 'eq'     and indexed_val = :p_indexed_val
union all
select * from sample_table where :p_filter_type = 'like'   and indexed_val like :p_indexed_val
union all
select * from sample_table where :p_filter_type = 'regexp' and regexp_like(indexed_val, :p_indexed_val);

select * from table(dbms_xplan.display(format => '-cost -bytes -rows'));

Sample plan

This demonstrates vastly different plans being used depending on input. A single = will use an INDEX RANGE SCAN, no predicate will use a TABLE ACCESS FULL. The regular expression also uses a full table scan since there is no way to index regular expressions. Although depending on the exact type of expressions it may be possible to enable useful indexing through function based indexes or Oracle Text indexes.

Plan hash value: 100704550

------------------------------------------------------------------------------
| Id  | Operation                             | Name              | Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                   | 00:00:01 |
|   1 |  UNION-ALL                            |                   |          |
|*  2 |   FILTER                              |                   |          |
|   3 |    TABLE ACCESS FULL                  | SAMPLE_TABLE      | 00:00:01 |
|*  4 |   FILTER                              |                   |          |
|   5 |    TABLE ACCESS BY INDEX ROWID BATCHED| SAMPLE_TABLE      | 00:00:01 |
|*  6 |     INDEX RANGE SCAN                  | SAMPLE_TABLE_IDX1 | 00:00:01 |
|*  7 |   FILTER                              |                   |          |
|   8 |    TABLE ACCESS BY INDEX ROWID BATCHED| SAMPLE_TABLE      | 00:00:01 |
|*  9 |     INDEX RANGE SCAN                  | SAMPLE_TABLE_IDX1 | 00:00:01 |
|* 10 |   FILTER                              |                   |          |
|* 11 |    TABLE ACCESS FULL                  | SAMPLE_TABLE      | 00:00:01 |
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(:P_FILTER_TYPE='all')
   4 - filter(:P_FILTER_TYPE='eq')
   6 - access("INDEXED_VAL"=:P_INDEXED_VAL)
   7 - filter(:P_FILTER_TYPE='like')
   9 - access("INDEXED_VAL" LIKE :P_INDEXED_VAL)
       filter("INDEXED_VAL" LIKE :P_INDEXED_VAL)
  10 - filter(:P_FILTER_TYPE='regexp')
  11 - filter( REGEXP_LIKE ("INDEXED_VAL",:P_INDEXED_VAL))

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow