Question

I build a tool for data extraction and transformation. Typical use case - transactionally processing lots of data.

Numbers are - about 10sec - 5min duration, 200-10000 row updated (long duration caused not by the database itself but by outside services that used during transaction).

There are two types of agents that access database - multiple read agents, and only one write agent (so, there are never multiple concurrent write).

During the transaction:

  • Read agents should be able to read database and see it in the current state.
  • Write agent should be able to read database (it does both - read and write during transaction) and see it in the new (not yet committed) state.

Is PostgreSQL a good choice for that type of load? I know it uses MVCC - so it should be ok in general, but is it ok to use long and big transactions extensively?

What other open-source transactional databases may be a good choice (I am not limited to SQL)?

P.S.

I do not know if the sharding may affect the performance. The database will be sharded. For every shard there will be multiple readers and only one writer, but multiple different shards can be written to at the same time.

I know that it's better not to use outside services during transaction, but in that case - it's the goal. The database used as a reliable and consistent index for some heavy, huge, slow and eventually-consistent data processing tool.

Was it helpful?

Solution

Huge disclaimer: as always, only real life test can tell you the truth.

But, I think PostgreSQL will not let you down, if you use most recent version (at least 9.1, better 9.2) and tune it properly.

I have somewhat similar load in my server, but with slightly worse R/W ratio: about 10:1. Transactions range from few milliseconds up to 1 hour (and sometimes even more), and one transaction can insert or update up to 100k rows. Total number of concurrent writers with long transactions can reach 10 and more. So far so good - I don't really have any serious issues, performance is great (certainly not worse than I expected).

What really helps is that my hot working data set almost fits into available memory.

So, give it a try, it should work great for your load.

OTHER TIPS

Have a look at this link. Maximum transaction size in PostgreSQL

Basically there can be some technical limits on the software side to how large your transaction can be.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top