Why is pagination so resource-expensive?

https://stackoverflow.com/questions/29580

09-06-2019
|

Question

It's one of those things that seems to have an odd curve where the more I think about it, the more it makes sense. To a certain extent, of course. And then it doesn't make sense to me at all.

Care to enlighten me?

Solution

Because in most cases you've got to sort your results first. For example, when you search on Google, you can view only up to 100 pages of results. They don't bother sorting by page-rank beyond 1000 websites for given keyword (or combination of keywords).

Pagination is fast. Sorting is slow.

OTHER TIPS

Lubos is right, the problem is not the fact that you are paging (which takes a HUGE amount of data off the wire), but that you need to figure out what is actually going on the page..

The fact that you need to page implies there is a lot of data. A lot of data takes a long time to sort :)

This is a really vague question. We'd need a concrete example to get a better idea of the problem.

This question seems pretty well covered, but I'll add a little something MySQL specific as it catches out a lot of people:

Avoid using SQL_CALC_FOUND_ROWS. Unless the dataset is trivial, counting matches and retrieving x amount of matches in two separate queries is going to be a lot quicker. (If it is trivial, you'll barely notice a difference either way.)

I thought you meant pagination of the printed page - that's where I cut my teeth. I was going to enter a great monologue about collecting all the content for the page, positioning (a vast number of rules here, constrait engines are quite helpful) and justification... but apparently you were talking about the process of organizing information on webpages.

For that, I'd guess database hits. Disk access is slow. Once you've got it in memory, sorting is cheap.

Of course sorting on a random query takes some time, but if you're having problems with the same paginated query being used regulary, there's either something wrong with the database setup (improperly indexing/none at all, too little memory etc. I'm not a db-manager) or you're doing pagination seriously wrong:

Terribly wrong: e.g. doing select * from hugetable where somecondition; into an array getting the page count with the array.length pick the relevant indexes and dicard the array - then repeating this for each page... That's what I call seriously wrong.

The better solution two queries: one getting just the count then another getting results using limit and offset. (Some proprietary, nonstandard-sql server might have a one query option, I dunno)

The bad solution might actually work quite okay in on small tables (in fact it's not unthinkable that it's faster on very small tables, because the overhead of making two queries is bigger than getting all rows in one query. I'm not saying it is so...) but as soon as the database begins to grow the problems become obvious.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow