Question

I need to store a very simple data structure on a disk - the Point. It's fields are just:

  • Moment - 64-bit integer, representing a time with high precision

  • EventType - 32-bit integer, reference to another object

  • Value - 64-bit floating point number

Requirements:

1) The pair of (Moment + EventType) is unique identifier of the Point, so I suspect it to be a composite primary key for a table

2) There's a huge number of Points. Up to 5 billions (1-2 TB of disk space). So the format must be as small as possible.

3) Typical and almost single usage of the table is a retrieval (or creating a view) of millions of Points by exact EventType and a range of Moments.

Questions:

Which RDBMS to choose and why?

What is the optimal sql definition for a table of Points?

And comments about my thoughts below are also appreciated

My research:

I'm a complete newbie in the field of RDBMS, but I've heard a lot about SQLite. I don't need a huge professional system with all tools, features and extensions like PostgreSQL or MSSQL. Also I don't feel I need a server instead of simple "embedded" database file, so the choise of SQLite looks optimal. Another great RDBMS with a feature of embedded database is Firebird, but I was seduced by SQLite's dynamic typing paradigm. It looks like it can save me space on disk, because integer fields can be stored in "smaller" form (1, 2, 3, 4, 6 bytes).

But shortly problems appeared.

First of all, SQLite creates special ROWID column (64-bits length) when primary key is composite:

CREATE TABLE points (
    moment integer not null,
    event_id integer not null,
    value numeric not null,
    PRIMARY KEY (moment, event_id)
);

It means table wastes nearly 40% much space for nothing.

I found nice article about "The WITHOUT ROWID Optimization". But it will be avaible only in 3.8.2 version of SQLite (December 2013). Waiting for ADO.NET provider which I need is inappropriate.

Another problem is that SQLite uses B-tree for tables. I'm not sure, but it looks like it is inefficient for selecting data ranges. My main task is to select a big block of Points based on primary key's range, so it looks like SQLite will be bad choise.

Future research seems too hard for me (at least for today). Looking forward for a help of experienced guys.

Was it helpful?

Solution

B-trees are the most efficient organization for selecting data ranges.

If you search a constant event_id value and a range of moment values, the two-column index can be used for both lookups only if the event_id is the first column in the index:

CREATE TABLE points (
    event_id INTEGER NOT NULL,
    moment INTEGER NOT NULL,
    value NUMERIC NOT NULL,
    PRIMARY KEY (event_id, moment)
);

You should try to use version 3.8.2 so that you can use the WITHOUT ROWID optimization.
The developers are likely to be happy that somebody will test this function, and give you a compiled prerelease version.

OTHER TIPS

I think if your table will be use more than one user you should not use embedded DB
What about Oracle - table organized by index (by event perhaps) + partitioning by range
or MySql partition by range

if in your application will be really one user, maybe you can use file system ?
something like partitioned table
you can create folder with name related to range
and create files with name related with event_id so you need store in file only moment + data
even more for example your moment look like
201311141820001234567890123456
you can create folder with name 2013111418 and store in file only part of moment and data
20001234567890123456,data
20001234567890123457,data

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top