Is this how non-bulk binding PL/SQL code should be translated to bulk-binding code, and is there ever a reason to forgo buk binding?

StackOverflow https://stackoverflow.com/questions/18155445

Domanda

(This is all Oracle 10g):

CREATE OR REPLACE FUNCTION bar(...)
IS
    v_first_type VARCHAR2(100) ;
    v_second_type VARCHAR2(100);

    CURSOR cur IS SELECT a,b FROM source_table ;
    v_a int;
    v_b char;
BEGIN
    OPEN cur;
    <<l_next>> --10G doesn't have the continue statement.
    LOOP
        FETCH cur INTO v_a, v_b ;
        EXIT WHEN cur%NOTFOUND ;

        --Ignore Record Case: ignore the record entirely
        IF a == -1 THEN
            -- do something
            GOTO l_next ; --10g doesn't have the continue statement.
        ELSE
            -- do something else
            v_first := 'SUCCESS' ;
        END IF;

        -- Transform Case:
        IF b == 'z' THEN
            -- do something
            v_second := 'something';
        ELSE
            -- do something
            v_second := 'something else';
        END IF;


        INSERT INTO report_table VALUES (v_first, v_second);
    END LOOP;
    CLOSE cur;
EXCEPTION 
    ...
END;

I'm at my first job out of college. I'm looking through some legacy code that looks like the general framework above (except it is several hundreds of lines long and uses much more complicated processing (no set-based solution possible)).

Its pulling lots of rows from one table into a cursor, looping through the cursor, making transformations to it, and inserting the results into a report table. The cursor does not insert every record -- if there is something wrong with the record or if it doesn't like it for whatever reason, it will skip the record without inserting it (see the GOTO statements).

Problem 1: The inserts are occurring one by one inside the loop, instead of doing a FORALL at the end, outside of the loop.

Problem 2: The cursor does not use BULK COLLECT.

On top of this, there is a stored procedure that has a cursor which again doesn't use BULK COLLECT, and it issues this function while looping through the records in the cursor. One commit is issued at the end of each record that's been looped over. There are no commits in the function I'm writing about here.

I want to rewrite the code to look like this:

CREATE OR REPLACE FUNCTION bar(...)
IS
    CURSOR cur IS SELECT a,b FROM source_table ;

    TYPE t_source IS TABLE OF cur%ROWTYPE INDEX BY PLS_INTEGER;
    TYPE t_report IS TABLE OF destination_table%ROWTYPE INDEX BY PLS_INTEGER;
    v_sources t_source;
    v_reports t_report
    v_report_inx INT := 0; -- To Prevent Sparse Collection
BEGIN
    OPEN cur;
    <<l_next>> --10G doesn't have the continue statement.
    LOOP
        FETCH cur BULK COLLECT INTO v_sources LIMIT 100 ;
        EXIT WHEN v_sources.count = 0 ;

        FOR i IN 1 .. v_sources LOOP
            --Ignore Record Case: ignore the record entirely
            IF v_sources(i).a == -1 THEN
                -- do something
                GOTO l_next ; --10g doesn't have the continue statement.
            ELSE
                -- do something else
                v_reports(v_report_inx).first := 'SUCCESS' ;
            END IF;

            -- Transform Case:
            IF v_sources(i).b == 'z' THEN
                -- do something
                v_reports(v_report_inx).second := 'something';
            ELSE
                -- do something
                v_reports(v_report_inx).second := 'something else';
            END IF;

            v_report_inx := v_report_inx + 1;
        END LOOP;


    END LOOP;

    FORALL i in 1 .. v_reports.count
            INSERT INTO report_table (first, second) VALUES (v_reports(i).first, v_reports(i).v_second);

    CLOSE cur;
EXCEPTION 
    ...
END;

The material changes are 1) using BULK COLLECT into an associative array, and 2) using FORALL out of a different associative array.

I have two questions:

1) Based on the framework I provided in the first snippet, are my alterations the most superior way to do it? Would you do it another way?

2) Is there any reason that I am not thinking of which would make someone not use BULK COLLECT and FORALL? Perhaps some complicated processing that I haven't realized yet in the legacy code? The code was originally formed in 2002 (so I would assume 8i or 9i) but has been updated since then. 9i had bulk binding. 8i also had bulk binding. Both had associative arrays. So I feel like there must be a reason why they hadn't been using bulk binding.

È stato utile?

Soluzione

There are nothing wrong with general idea of migrating to bulk collect. Bulk operations just minimizes number of context switches and database round trips.

There are only one generic thing which wrong with your code. LIMIT clause prevents overusing memory by bulk operations, so it's proper decision to use it with bulk collect. But v_reports grows uncontrolled. Therefore move bulk insert inside a loop and clear v_reports afterwards.

There are some inaccuracies in altered code. Please review code fragment below, comments in /**/ style are mine.

CREATE OR REPLACE FUNCTION bar(...)
IS
    CURSOR cur IS SELECT a,b FROM source_table ;

    TYPE t_source IS TABLE OF cur%ROWTYPE INDEX BY PLS_INTEGER;
    TYPE t_report IS TABLE OF destination_table%ROWTYPE INDEX BY PLS_INTEGER;
    v_sources t_source;
    v_reports t_report

    /* 1. correct type is same as type of index
       2. There are nothing wrong with sparse collections, but a separate 
           counter which incremented continuously needed for t_report.
    */
    v_report_inx PLS_INTEGER := 0; -- To Prevent Sparse Collection

BEGIN
    OPEN cur;
    <<l_next>> --10G doesn't have the continue statement.
    LOOP
        FETCH cur BULK COLLECT INTO v_sources LIMIT 100 ;

        /* On last step v_sources.count < 100, not exactly 0.
           Also if there are no elements then no processing done, 
           so check at the end of loop.
        EXIT WHEN v_sources.count = 0;
        */

        /* correct way is to loop from 1 to count
          (.last and .first not usable because both is null for empty array)
        */
        FOR i IN 1  .. v_sources.count LOOP

            v_report_inx := v_report_inx + 1;

            --Ignore Record Case: ignore the record entirely
            IF v_sources(i).a = -1 THEN
                -- do something
                GOTO l_next ; --10g doesn't have the continue statement.
            END IF;

            /* No need for ELSE here, just execution continues */
             -- do something else
             v_reports(v_report_inx).first := 'SUCCESS' ;


            -- Transform Case:
            IF v_sources(i).b = 'z' THEN
                -- do something
                v_reports(v_report_inx).second := 'something';
            ELSE
                -- do something
                v_reports(v_report_inx).second := 'something else';
            END IF;

        END LOOP;


        /* Use "indicies of" construct to deal with sparsed collections */
        FORALL i in indices of v_reports
              /* t_report already declared with %ROWTYPE 
                 so just insert entire row, it works faster */
              INSERT INTO report_table VALUES v_reports(i);

        /* Cleanup after insert */
        v_reports.delete;

        /* If number of selected records less than LIMIT then last row reached. */
        EXIT WHEN v_sources.count < 100;

    END LOOP;


    CLOSE cur;
EXCEPTION
    ...
END;

Update

Thanks to @jonearles. He encouraged me to test performance for different approaches to handle cursors in PL/SQL.

Below is results for test with 3 000 000 records. It's clearly that migration from plain explicit cursor to bulk collect approach give a real performance gain.
At the same time explicit cursor with bulk collect option and properly choosed LIMIT always outperform implicit cursor, but difference between them lies in acceptable bounds.

Variant name           | Time (sec)
-------------------------------------
bulk_cursor_limit_500  |  1.26
bulk_cursor_limit_100  |  1.52
bulk_unlimited         |  1.75
implicit_cursor        |  1.83
plain_cursor           | 27.20

Below is code for test (limited SQLFiddle example here)

Scheme setup

drop table t
/
drop table log_run
/
create table t(a number, b number)
/
insert into t select level, level from dual connect by level <= 3000000
/

create table log_run(id varchar2(30), seconds number);
/

delete log_run
/

Single test run

declare
  cursor test_cur is
    select a, b from t;

  test_rec test_cur%rowtype;
  counter    number;

  vStart timestamp;
  vEnd timestamp;
  vTimeFormat varchar2(30) := 'SSSSS.FF9';
begin

  vStart := systimestamp;

  open test_cur;
  loop
    fetch test_cur into test_rec;
    exit when test_cur%notfound;
    counter := counter + 1;
  end loop;
  close test_cur;

  vEnd := systimestamp;
  insert into log_run(id, seconds) 
    values('plain_cursor', 
             to_number(to_char(vEnd,vTimeFormat))
             -
             to_number(to_char(vStart,vTimeFormat)) 
          )
  ;

end;
/

--Implicit cursor
--0.2 seconds
declare
  test_rec   t%rowtype;
  counter    number;

  vStart timestamp;
  vEnd timestamp;
  vTimeFormat varchar2(30) := 'SSSSS.FF9';
begin

  vStart := systimestamp;

  for c_test_rec in (select a, b from t) loop
    test_rec.a := c_test_rec.a;
    test_rec.b := c_test_rec.b;
    counter := counter + 1;
  end loop;

  vEnd := systimestamp;
  insert into log_run(id, seconds) 
    values('implicit_cursor', 
             to_number(to_char(vEnd,vTimeFormat))
             -
             to_number(to_char(vStart,vTimeFormat)) 
          )
  ;

end;
/

declare
  cursor test_cur is
    select a, b from t;

  type t_test_table is table of t%rowtype;

  test_tab   t_test_table;
  counter    number;

  vStart timestamp;
  vEnd timestamp;
  vTimeFormat varchar2(30) := 'SSSSS.FF9';
begin

  vStart := systimestamp;

  open test_cur;
  loop
    fetch test_cur bulk collect into test_tab limit 100;
    for i in 1 .. test_tab.count loop
      counter := counter + 1;
    end loop;

    exit when test_tab.count < 100;
  end loop;

  close test_cur;

  vEnd := systimestamp;
  insert into log_run(id, seconds) 
    values('bulk_cursor_limit_100', 
             to_number(to_char(vEnd,vTimeFormat))
             -
             to_number(to_char(vStart,vTimeFormat)) 
          )
  ;

end;
/


declare
  cursor test_cur is
    select a, b from t;

  type t_test_table is table of t%rowtype;

  test_tab   t_test_table;
  counter    number;

  vStart timestamp;
  vEnd timestamp;
  vTimeFormat varchar2(30) := 'SSSSS.FF9';
begin

  vStart := systimestamp;

  open test_cur;
  loop
    fetch test_cur bulk collect into test_tab limit 500;
    for i in 1 .. test_tab.count loop
      counter := counter + 1;
    end loop;

    exit when test_tab.count < 500;
  end loop;

  close test_cur;

  vEnd := systimestamp;
  insert into log_run(id, seconds) 
    values('bulk_cursor_limit_500', 
             to_number(to_char(vEnd,vTimeFormat))
             -
             to_number(to_char(vStart,vTimeFormat)) 
          )
  ;

end;
/

declare

  type t_test_table is table of t%rowtype;

  test_tab   t_test_table;
  counter    number;

  vStart timestamp;
  vEnd timestamp;
  vTimeFormat varchar2(30) := 'SSSSS.FF9';
begin

  vStart := systimestamp;

  select * bulk collect into test_tab from t;

  for i in 1 .. test_tab.count loop
    counter := counter + 1;
  end loop;

  vEnd := systimestamp;
  insert into log_run(id, seconds) 
    values('bulk_unlimited', 
             to_number(to_char(vEnd,vTimeFormat))
             -
             to_number(to_char(vStart,vTimeFormat)) 
          )
  ;

end;
/

Select average results

select * from ( 
  select lr.id, trunc(avg(seconds),2) seconds  
  from log_run lr group by lr.id) 
  order by seconds
)

Altri suggerimenti

I'd rewrite this so that GOTO was not used (I guess I'm just an old unreconstructed structured programmer at heart :-). I'd also get rid of the explicit cursor and use a cursor FOR loop, which starting with 10g will often be bulk-bound behind the scenes. Try:

CREATE OR REPLACE FUNCTION bar(...)
IS
    v_first_type VARCHAR2(100) ;
    v_second_type VARCHAR2(100);
BEGIN
    <<OUTER_LOOP>>
    FOR aRow In (SELECT A, B FROM SOURCE_TABLE)
    LOOP

      <<INNER_LOOP>>
      LOOP  -- This loop is used to allow us to skip the later INSERT, and
            -- will only be passed through once for each row returned by
            -- the FOR loop.
        --Ignore Record Case: ignore the record entirely
        IF aRow.A == -1 THEN
            -- do something
            EXIT INNER_LOOP;  -- rather than GOTO
        ELSE
            -- do something else
            v_first := 'SUCCESS' ;
        END IF;

        -- Transform Case:
        IF aRow.B == 'z' THEN
            -- do something
            v_second := 'something';
        ELSE
            -- do something
            v_second := 'something else';
        END IF;


        INSERT INTO report_table VALUES (v_first, v_second);

        EXIT INNER_LOOP;  -- the "loop" is used to allow the INSERT to be
                          -- skipped and thus we don't ever want to go back
                          -- to the top
      END LOOP;  -- INNER_LOOP
    END LOOP;  -- OUTER_LOOP
EXCEPTION 
    ...
END;

Note the use of the inner loop with an explicit exit to keep top-down flow while allowing explicit control of looping.

I also recommend running this code under the Oracle profiler to get an understanding of which lines of code are taking up the most time. Trying to optimize code by guessing where the bottlenecks are is a waste of time. Until you've profiled it you're guessing - and you'd never guess how often I've guessed wrong. :-) Code spends its time in the damndest places...

Share and enjoy.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top