Question

I have a large dataset to return from Oracle, too large to fit into memory.

I need to re-walk the entire dataset many times.

Because of the size of the dataset, rerunning the query all the time is obviously not an option.

Is there a way to access a scrollable Cursor from Oracle? I'm using cx_Oracle.

In PostgreSQL, I can just do cursor.scroll(0, mode='absolute') to send the cursor back to beginning of dataset.

Google suggests that OCI 8 supports scrollable clientside cursor, and has C examples for constructing such a cursor. The cx_Oracle documentation doesn't show a Cursor.scroll method, even though it's specified as part of DB-API 2.0.

Am I going to be stuck using pyodbc or something?

No correct solution

OTHER TIPS

Short answer, no.

Longer answer...

Although Cursor.scroll() is specified as part of PEP 249, it's in the Optional DB API Extensions section, to quote:

As with all DB API optional features, the database module authors are free to not implement these additional attributes and methods (using them will then result in an AttributeError) or to raise a NotSupportedError in case the availability can only be checked at run-time.

This simply hasn't been implemented in cx_Oracle, though, as you say, it is possible with OCI.

You mention that the dataset is too large to fit into memory, I assume you mean client-side? Have you considered letting the database shoulder the burden? You don't mention what your query is, how complicated, actually how much data is returned etc but you could consider caching the result-set. There are a few options here, and both the database and the OS will do it in the background anyway, however, the main one would be to use the RESULT_CACHE hint:

select /*+ result_cache */ ...
  from ...

The amount of memory you can use is based on the RESULT_CACHE_MAX_SIZE initialization parameter, the value of which you can find by running the following query

select *
  from v$parameter
 where name = 'result_cache_max_size'

How useful this is depends on the amount of work your database is doing, the size of the parameter, etc. There's a lot of information available on the subject.

Another option might be to use a global temporary table (GTT) to persist the results. Use the cursor to insert data into the GTT and then your select becomes

select * from temp_table

I can see one main benefit, you'll be able to access the table by the index of the row, as you wish to do with the scrollable cursor. Declare your table with an additional column, to indicate the index:

create global temporary table temp_table (
    i number
  , col1 ...
  , primary key (i)
    ) on commit delete rows

Then insert into it with the ROWNUM psuedocolumn to create the same "index" as you would in Python:

select rownum - 1, c.*
  from cursor c

To access the 0th row you can then add the predicate WHERE i = 0, or to "re-start" the cursor, you can simply re-select. Because the data is stored "flat", re-accessing should be a lot quicker.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top