Simple bulk data persistance framework

https://stackoverflow.com/questions/4278689

28-09-2019
|

Question

Is there an ACID framework for bulk data persistance, which would also allow some basic search capabilities? I am not looking for a full blown DBMS, but rather something fast, light and simple. Even something which would just take care of atomic commits would be great, just to avoid reinventing this in case of power failure.

SQL Server is too slow for this and has too much overhead, SQLite is even slower (with potentially less overhead?).

Basically, I need to store large quantities of timetamped data each second. As normalized data, this would correspond to ~10k table rows, but as binary data, it can be represented using ~200kb. Obviously, writing 200kb to disk is a piece of cake compared to writing 10k rows to a relational database.

I could simply persist it in one or more large binary files, and then implement some indexing of my own to allow fast filtering on certain fields, but the only thing that frightens me are non-atomic transactions and read/write locking scenarios.

Any recommendations? I am using C# btw, so anything with .NET wrappers would be preferred.

[Edit] Regarding ACID, I just found this, for example: Managed wrapper for Transactional NTFS (although TxF is a "Vista and later" feature).

Solution

Traditional SQL-based storages will provide ACID, however bulk updates of many will be slow. From the other side NoSQL solutions/key-value stores usually won't provide you with reliable transactions or with some way to index seamlessly for fast lookups by something else than just a single key. So we need something that combines benefits of both approaches.

I would consider using CouchDB (NoSQL map/reduce document-based DB with RESTful API) and adopt the following strategy: CouchDB doesn't have transactions in terms of saving multiple document atomically, however when it goes about saving a single document - it is super-reliable and atomic, also allowing multi-version concurrency control.

So if you have 10000 records data bulks ~200-300 kB each you can save it as a single document. It may sound strange for you, but the thing is you can build views on top you document collections which are actually incremental indexes. And one document may produce multiple view results. Views are written in javascript (which is evaluated only once on document creation/update), so you can index them as you want - by keywords, numeric values, dates - virtually anything you can do with javascript. Fetching view results is very fast, cuz they are preindexed into the B+-tree.

Benefits of this approach:

CouchDB uses JSON over HTTP as its data transport protocol, so you can use any HTTP client or REST client or a native C# wrapper (there are several available around)
Your bulk insert of that 200 kB document will be atomic and take a single HTTP request
Your insert will be async, because it's just an HTTP.
You will have MVCC - CouchDB is very good about concurrency, so you will forget about any locks or smth.

Just give it a chance - it saved me tons of time.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow