Choosing/configuring a database for high-throughput, reliable, consistent write throughput, sacrificing latency

https://stackoverflow.com/questions/12832816

06-07-2021
|

Question

I'm working on a real-time application with the following characteristics:

Hundreds of clients will be inserting rows/documents at the same time, each inserting a row every couple of seconds.
Largely append-only; almost all rows/documents, once inserted, are never changed.
A client should only see success when data has been flushed to disk, and thereafter read-your-writes consistency should hold.
Clients are willing to wait on the order of seconds for confirmation - long enough for multiple disk seeks and writes to happen.
There is too much data to fit in RAM (ruling out options like Redis). But written-long-ago rows are rarely accessed, so it's acceptable to not have them in memory.
Ideally, these writes should not block reads.
A key-value store is fine, but there needs to be at least one reliable auto-incrementing index.

In other words (and tl;dr), the clients can tolerate latency, but they need a lot of trustable write throughput - more throughput than "one write is one disk operation."

I'm envisioning a database that would be implemented something like this: accept a (theoretically limited by the number of file descriptors) number of TCP connections, buffer those writes in memory, log batches of them to disk as often as possible (along with the updates to the auto-incrementing index), and only respond to those TCP connections when the associated disk write operation is complete. Or it could be as simple as a lazily-writing database publishing a message that it has done a disk write (clients wait for the lazy response, then wait for the write message to report success).

I think that with such high latency tolerance, this isn't asking too much. And I'd imagine that others have had this problem, such as financial companies that can't afford to lose data, but can afford to delay the response to any one client.

Do any battle-tested database solutions like Postgres, CouchDB/Couchbase, or MongoDB support modes of operation like this?

Solution

PostgreSQL should fit this workload pretty well; pretty much everything you've specified is well within its normal feature set. Pg is ACID compliant, supports group commit to reduce sync overhead, writers don't block readers, and it uses the OS for caching so it'll naturally tend to keep only the hot data set in memory.

"Clients are willing to wait on the order of seconds for confirmation - long enough for multiple disk seeks and writes to happen"

If considering PostgreSQL your application is ideally suited to a really big commit_delay, which will help immensely with write throughput. You can't use synchronous_commit = off because you need confirmation of commit before reply, but you can just hold commits queued up for a few seconds to save on sync costs.

If you use Pg for a job like this you will need to tune checkpointing to make sure that checkpoints don't stall I/O. Make sure the bgwriter is aggressively writing out dirty buffers. Ensure autovaccum is running frequently - you aren't deleting from the tables, but the indexes still need maintenance, and so do the table statistics.

If you're expecting lots of data and your queries usually have a temporal element, consider partitioning the table into chunks of (say) 1 month for the 1st year, consolidating everything older than 12 months into tables partitioned by year. Pg has only limited built-in partitioning (it's hacked together using inheritance and constraint exclusion) so you have to do it by hand/script using triggers, but it does the job.

See:

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow