How can I speed up row_number in Oracle?

https://stackoverflow.com/questions/827108

05-07-2019
|

Question

I have a SQL query that looks something like this:

SELECT * FROM(
    SELECT
        ...,
        row_number() OVER(ORDER BY ID) rn
    FROM
        ...
) WHERE rn between :start and :end

Essentially, it's the ORDER BY part that's slowing things down. If I were to remove it, the EXPLAIN cost goes down by an order of magnitude (over 1000x). I've tried this:

SELECT 
    ...
FROM
    ...
WHERE
    rownum between :start and :end

But this doesn't give correct results. Is there any easy way to speed this up? Or will I have to spend some more time with the EXPLAIN tool?

Solution

ROW_NUMBER is quite inefficient in Oracle.

See the article in my blog for performance details:

Oracle: ROW_NUMBER vs ROWNUM

For your specific query, I'd recommend you to replace it with ROWNUM and make sure that the index is used:

SELECT  *
FROM    (
        SELECT  /*+ INDEX_ASC(t index_on_column) NOPARALLEL_INDEX(t index_on_column) */
                t.*, ROWNUM AS rn
        FROM    table t
        ORDER BY
                column
        )
WHERE rn >= :start
      AND rownum <= :end - :start + 1

This query will use COUNT STOPKEY

Also either make sure you column is not nullable, or add WHERE column IS NOT NULL condition.

Otherwise the index cannot be used to retrieve all values.

Note that you cannot use ROWNUM BETWEEN :start and :end without a subquery.

ROWNUM is always assigned last and checked last, that's way ROWNUM's always come in order without gaps.

If you use ROWNUM BETWEEN 10 and 20, the first row that satisifies all other conditions will become a candidate for returning, temporarily assigned with ROWNUM = 1 and fail the test of ROWNUM BETWEEN 10 AND 20.

Then the next row will be a candidate, assigned with ROWNUM = 1 and fail, etc., so, finally, no rows will be returned at all.

This should be worked around by putting ROWNUM's into the subquery.

OTHER TIPS

Looks like a pagination query to me.

From this ASKTOM article (about 90% down the page):

You need to order by something unique for these pagination queries, so that ROW_NUMBER is assigned deterministically to the rows each and every time.

Also your queries are no where near the same so I'm not sure what the benefit of comparing the costs of one to the other is.

Is your ORDER BY column indexed? If not that's a good place to start.

Part of the problem is how big is the 'start' to 'end' span and where they 'live'. Say you have a million rows in the table, and you want rows 567,890 to 567,900 then you are going to have to live with the fact that it is going to need to go through the entire table, sort pretty much all of that by id, and work out what rows fall into that range.

In short, that's a lot of work, which is why the optimizer gives it a high cost.

It is also not something an index can help with much. An index would give the order, but at best, that gives you somewhere to start and then you keep reading on until you get to the 567,900th entry.

If you are showing your end user 10 items at a time, it may be worth actually grabbing the top 100 from the DB, then having the app break that 100 into ten chunks.

Spend more time with the EXPLAIN PLAN tool. If you see a TABLE SCAN you need to change your query.

Your query makes little sense to me. Querying over a ROWID seems like asking for trouble. There's no relational info in that query. Is it the real query that you're having trouble with or an example that you made up to illustrate your problem?

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow