Breaking down a large number of rows into smaller queries? Parallelism

https://stackoverflow.com/questions/18615066

27-06-2022
|

Domanda

I want to create a external application which will query one table from a large Oracle database.

The query will run daily and I am expecting to handle 30,000+ rows.

To break down the size of these rows, I would like to create a new thread/ process for each 10,000 rows that exist. So going by the above figure it would be 3 threads to process all those rows.

I don't want each thread to overlap each others row set so I know I will need to add a column within the table to act as a range marker, a row_position

Logic

Get row_count of data set in query parameters
Get first_row_pos

While (row_count > 10,000)
{
    Create thread with 10,000 rows starting from first_row_pos
    row_count == row_count - 10,000
    first_row_pos = first_row_pos + 10,000
}

create thread for remaining rows
all threads run their queries concurrently.

This is basic logic at the moment, however I do not know how feasible this is.

Is this a good way or is there a better way?

Can this be done through one database connection with each thread sharing or is it better to have a seperate db connection for each thread?

Any other advice welcome?

Note: I just realised a do while loop would be better if there is less than 10,000 rows in this case.

Thanks

Soluzione

Oralce provide a parallel hint for sutuations such as this where you have a full table scan or similar problem and want to make use of multiple cores to divide the workload. Further details here.

The syntax is very simple, you specify the table (or alias) and the number of cores (I usually leave as default) e.g.:

select /*+ parallel(a, default) */ *
  from table_a a

You can also use this with multiple tables e.g.

select /*+ parallel(a, default) parallel(b,default) */ *
  from table_a a, table_b b
 where a.some_id = b.some_id

Altri suggerimenti

A database connection is not thread-safe, so if you are going to query the database from several threads, you would have to have a separate connection for each of them. You can either create a connection or get them from a pool.

Before you implement your approach, take some time to analyze where is the time spent. Oracle overall is pretty good with utilizing multiple cores. And the database interaction is usually is the most time-consuming part. By splitting the query in three you might actually slow things down.

If indeed your application is spending most of the time performing calculations on that data, your best approach might be loading all data in a single thread and then splitting processing into multiple threads.

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow