Question

I have a heavy query which spools data into a csv file that is sent to users. I have manually made parallel sessions and am executing the query with filter condition so that i can join all spooled files at the end into one single file thus reducing the time to generate the data (usually it takes about 10 hours, with parallel sessions it takes 2.5-3 hours).

My query is how can I automate this such that the script will find out max(agreementid) and then distribute it into X number of spool calls to generate X files where each file will be having max 100000 record say.

Additional Explanation: I guess my question was not very clear. I will try and explain again.

  1. I have a table/view with large amount of data.
  2. I need to spool this data into a CSV file.
  3. It takes humongous amount of time to spool the CSV file.
  4. I run parallel spools by doing below. a) Select .... from ... where agreementid between 1 to 1000000; b) Select .... from ... where agreementid between 1000001 to 2000000; and so on and then spooling them individually in multiple sessions.
  5. This helps me to generate multiple file which I can then stictch together and share with users.
  6. I need a script (i guess dos based or AIX based) which will find the min and max of agreementID from my table and create the spooling scripts automatically and execute them through separate sessions of sql so that I get the files generated automatically.

Not sure whether I could make myself clear enough. Thanks guys for replying to my earlier query but that was not what I was looking at.

Was it helpful?

Solution

A bit unclear what you want, but I think you want a query to find a low/high range of agreement_ids for x groups of ids (buckets). If so, try something like (using 4 buckets in this example):

select bucket, min(agreement_id), max(agreement_id), count(1)
from (
  select agreement_id, ntile(4) over (order by agreement_id) bucket
  from my_table
)
group by bucket;

Edit: If your problem is in messing with spooling multiple queries and combining, I would rather opt for creating a single materialized view (using parallel in the underlying query on the driving table) and refresh (complete, atomic_refresh=>false) when needed. Once refreshed, simply extract from the snapshot table (to a csv or whatever format you want).

OTHER TIPS

There might be a simpler way, but this generates four 'buckets' of IDs, and you could plug the min and max values into your parametrized filter condition:

select bucket, min(agreementid) as min_id, max(agreementid) as max_id
from (
    select agreementid,
        case when rn between 1 and cn / 4 then 1
            when rn between (cn / 4) - 1 and 2 * (cn / 4) then 2
            when rn between (2 * cn / 4) - 1 and 3 * (cn / 4) then 3
            when rn between (3 * cn / 4) - 1 and cn then 4
        end as bucket
    from (
        select agreementid, rank() over (order by agreementid) as rn,
            count(*) over () as cn from agreements
    )
)
group by bucket;

If you wanted an upper limit for each bucket rather than a fixed number of buckets then you could do:

select floor(rn / 100000), min(agreementid) as min_id, max(service_num) as max_id
from (
    select agreementid, rank() over (order by agreementid) as rn
    from agreements
)
group by floor(rn / 100000);

And then pass each min/max to a SQL script, e.g. from a shell script calling SQL*Plus. The bucket number could be passed as well and be used as part of the spool file name, via a positional parameter.

I'm curious about what you've identified as the bottleneck though; have you tried running it as a parallel query inside the database, with a /*+ PARALLEL */ hint?

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top