Domanda

I want to select rows from say Nth row to Mth row in a table. I don't want to use any orderby because the table data is huge, it's 38 million. I found a solution for this which says to use the following query

SELECT *
FROM (select suppliers2.*, rownum rnum from
               (select * from suppliers ORDER BY supplier_name) suppliers2
                where rownum <= 5 )
WHERE rnum >= 3;

But since it has two select statement and my table is very big it's 38 million rows, I wanted to know if there is any other way which is not taxing to the DB. I could also see I can use minus but I again see problem with performance. I basically want to select the first one million rows and put it into file, then select the 2nd million rows and put it into file and so on. Please help.

È stato utile?

Soluzione

It's not clear to me why you need to page through the results in the first place. You apparently want to grab an arbitrary 1 million rows, put that data in one file, grab another arbitrary 1 million rows (ensuring that you don't grab the same row twice), put that in a second file, and repeat the process until you've generated 38 separate files. What benefit do you derive from issuing 38 separate SELECT statements rather than issuing a single SELECT statement and letting the caller simply write the first million rows that it fetches to one file and then write the second million rows that it fetches to a second file?

Are you trying to generate the files in parallel from 38 separate worker processes? If so, it seems unlikely that you'll get much benefit from parallelizing the writes at the expense of increasing the amount of work that the database has to do substantially. I guess I could envision a system where writes were slow on the client but easy to parallelize while reads on the server were very fast and there was a ton of memory available for sorting on the database server that it might be quicker to write the files in parallel. But there aren't many systems with those characteristics. If you do want to use parallelism, you'd generally be better served letting the client issue a single SELECT to the database and allowing the database to run that SELECT statement in parallel.

If you are determined to select the results in pages, the query you posted should be the most efficient. The fact that there are nested select statements isn't particularly relevant to the analysis of performance. The query will only hit the table once. It still may be very expensive if it needs to fetch and sort all 38 million rows in order to determine which is the 3rd row and which is the 5th row. And it will likely get steadily slower when you look for subsequent pages of data. Fetching rows 37,000,001 - 38,000,000 will require, at a minimum, reading the entire table. That's one reason that it's unlikely to be all that helpful to write the files in parallel-- pulling the first few pages of data is likely to be so much more efficient than pulling the last page that you're going to be limited by that query and the time required to pull 38 million rows over the network.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top