Question

I have an experiment streaming up 1Mb/s of numeric data which needs to be stored for later processing. It seems as easy to write directly into a database as to a CSV file and I would then have the ability to easily retrieve subsets or ranges.

I have experience of sqlite2 (when it only had text fields) and it seemed pretty much as fast as raw disk access. Any opinions on the best current in-process DBMS for this application?

Sorry - should have added this is C++ intially on windows but cross platform is nice. Ideally the DB binary file format shoudl be cross platform.

Was it helpful?

Solution

If you only need to read/write the data, without any checking or manipulation done in database, then both should do it fine. Firebird's database file can be copied, as long as the system has the same endianess (i.e. you cannot copy the file between systems with Intel and PPC processors, but Intel-Intel is fine).

However, if you need to ever do anything with data, which is beyond simple read/write, then go with Firebird, as it is a full SQL server with all the 'enterprise' features like triggers, views, stored procedures, temporary tables, etc.

BTW, if you decide to give Firebird a try, I highly recommend you use IBPP library to access it. It is a very thin C++ wrapper around Firebird's C API. I has about 10 classes that encapsulate everything and it's dead-easy to use.

OTHER TIPS

If all you want to do is store the numbers and be able to easily to range queries, you can just take any standard tree data structure you have available in STL and serialize it to disk. This may bite you in a cross-platform environment, especially if you are trying to go cross-architecture.

As far as more flexible/people-friendly solutions, sqlite3 is widely used, solid, stable,very nice all around.

BerkeleyDB has a number of good features for which one would use it, but none of them apply in this scenario, imho.

I'd say go with sqlite3 if you can accept the license agreement.

-D

Depends what language you are using. If it's C/C++, TCL, or PHP, SQLite is still among the best in the single-writer scenario. If you don't need SQL access, a berkeley DB-style library might be slightly faster, like Sleepycat or gdbm. With multiple writers you could consider a separate client/server solution but it doesn't sound like you need it. If you're using Java, hdqldb or derby (shipped with Sun's JVM under the "JavaDB" branding) seem to be the solutions of choice.

You may also want to consider a numeric data file format that is specifically geared towards storing these types of large data sets. For example:

  • HDF -- the most common and well supported in many languages with free libraries. I highly recommend this.
  • CDF -- a similar format used by NASA (but useable by anyone).
  • NetCDF -- another similar format (the latest version is actually a stripped-down HDF5).

This link has some info about the differences between the above data set types: http://nssdc.gsfc.nasa.gov/cdf/html/FAQ.html

I suspect that neither database will allow you to write data at such high speed. You can check this yourself to be sure. In my experience - SQLite failed to INSERT more then 1000 rows per second for a very simple table with a single integer primary key.

In case of a performance problem - I would use CSV format to write the files, and later I would load their data to the database (SQLite or Firebird) for further processing.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top