Large ResultSet on postgresql query

https://stackoverflow.com/questions/827110

05-07-2019
|

Question

I'm running a query against a table in a postgresql database. The database is on a remote machine. The table has around 30 sub-tables using postgresql partitioning capability.

The query will return a large result set, something around 1.8 million rows.

In my code I use spring jdbc support, method JdbcTemplate.query, but my RowCallbackHandler is not being called.

My best guess is that the postgresql jdbc driver (I use version 8.3-603.jdbc4) is accumulating the result in memory before calling my code. I thought the fetchSize configuration could control this, but I tried it and nothing changes. I did this as postgresql manual recomended.

This query worked fine when I used Oracle XE. But I'm trying to migrate to postgresql because of the partitioning feature, which is not available in Oracle XE.

My environment:

Postgresql 8.3
Windows Server 2008 Enterprise 64-bit
JRE 1.6 64-bit
Spring 2.5.6
Postgresql JDBC Driver 8.3-603

Solution

In order to use a cursor to retrieve data you have to set the ResultSet type of ResultSet.TYPE_FORWARD_ONLY (the default) and autocommit to false in addition to setting a fetch size. That is referenced in the doc you linked to but you didn't explicitly mention that you did those steps.

Be careful with PostgreSQL's partitioning scheme. It really does very horrible things with the optimizer and can cause massive performance issues where there should not be (depending on specifics of your data). In any case, is your row only 1.8M rows? There is no reason that it would need to be partitioned based on size alone given that it is appropriately indexed.

OTHER TIPS

I'm betting that there's not a single client of your app that needs 1.8M rows all at the same time. You should think of a sensible way to chunk the results into smaller pieces and give users the chance to iterate through them.

That's what Google does. When you do a search there might be millions of hits, but they return 25 pages at a time with the idea that you'll find what you want in the first page.

If it's not a client, and the results are being massaged in some way, I'd recommend letting the database crunch all those rows and simply return the result. It makes no sense to return 1.8M rows just to do a calculation on the middle tier.

If neither of those apply, you've got a real problem. Time to rethink it.

After reading the later responses it sounds to me like this is more of a reporting solution that ought to be crunched in batch or calculated in real time and stored in tables that are not part of your transactional system. There's no way that bringing 1.8M rows to the middle tier for calculating moving averages can scale.

I'd recommend reorienting yourself - start thinking about it as a reporting solution.

The fetchSize property worked as described at postgres manual.

My mistake was that I was setting auto commit = false to a connection from a connection pool that was not the connection being used by the prepared statement.

Thanks for all the feedback.

I did everything above, but I needed one last piece: be sure the call is wrapped in a transaction and set the transaction to read only, so that no rollback state is required.

I added this: @Transactional(readOnly = true)

Cheers.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow