Manually trigger “Write current results to disk” during query

https://dba.stackexchange.com/questions/267379

02-03-2021
|

Question

Using PostgreSQL 12, I have the requirement that some data needs to be "refreshed" each night. That means: The database has to do some calculations based on the current date and write the result to the table. That works in general. However, the way it works is not perfect, it seems.

Basically, there is a PLV8 query like this:

DO $$

var myData = plv8.execute('SELECT id FROM my_table');
for(let i = 0; < myData.length; i++) {
    plv8.execute('SELECT my_function_which_refreshes_lots_of_data($1)',[myData[i].id);
}
$$ LANGUAGE plv8;

my_function_which_refreshes_lots_of_data takes the id of one item and does all the work. It has to be done this way, because triggering of refreshing also needs to be possible for single rows sometimes.

The problem is that somehow this got out of control. Currently, the query runs for more than an hour, has taken 99% of the memory, fills up the swap space and is most of the time in status "D" - "disk sleep (uninterruptible)". The duration of the query is not a problem, since it usually runs over night, triggered by a cronjob. However, the memory consumption and the process status (basically somehow like a zombie) is or might be a problem.

So, my idea was: As far as I know, PostgreSQL keeps all the changes to tables in memory while the query runs and writes them only at the end. Would it be possible to trigger manually "write to disk what you've got so far and free the memory"?

I would do that in my code after plv8.execute('SELECT my_function_which_refreshes_lots_of_data($1)',[myData[i].id);. Of course, if the process crashes or whatever, that would result in an inconsistent state: Part of the items would be processed by my_function_which_refreshes_lots_of_data, while the rest would remain as before. However, that's better than no item being written at all. And definitely better than a query running for so long and consuming all the memory.

I know I could somehow try to split the query at client side. However, that would be less convenient for me for some reasons.

Solution

No, PostgreSQL does not keep transactions in progress cached in memory. They are written to disk as usual.

All that happens at commit time is that the write ahead log (WAL) and the commit log are flushed to disk.

So you must investigate further to find the cause for your I/O overload.

Useful lines of investigation are:

use pg_stat_statements with pg_stat_statements.track = all to find the SQL statements that cause the load.
set track_io_timing = on to see which queries contribute most to the I/O load.
check if each function call runs in a separate transaction or not.

If yes, bundle several calls in a single transaction - perhaps the many WAL flushes are killing you.

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange