Question

I am considering various MVCC-capable databases for an upcoming project and PostgreSQL came onto my radar.

The requirements of my program involve a sequence roughly like the following:

  1. Read some information from the current version of the database, modify 80-90% of the data and write it back in one or more transactions (imagine something like updating the grid in Conway's Game of Life where both the old and new state of the grid are required).

  2. Wait 1-2 minutes after the commit. During this time, clients can issues reads against new data.

  3. Repeat.

The databases will be limited to something like 2-4GB.

~90% of the changes are updates to existing objects, ~5% will be new objects and ~5% will be deleted objects.

So my question is, can I reasonably run a plain VACUUM command as step 1.5 once every 1-2 minutes and have PostgreSQL be able to keep up with potentially 2-3+GB of changes being made each time?

Was it helpful?

Solution

I believe Postgres should do fine in this scenario. The scenario is uncommon enough that manual vacuum between the huge updates seems like a reasonable option.

Consider if You can make it so that instead of the huge updates, You generate a new set of tables, analyze them (necessary!), and then, with power of transactional ddl, drop the old ones and rename the new ones into their place. This should alleviate Your worries about VACUUM.

In such scenario, You should be doing some serious tuning. Especially, look at shared_buffers, checkpoint-related parameters, and vacuum-related ones. Also, remember about benchmarking with realistic workloads.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top