I need to store a very simple data structure on a disk - the Point
. It's fields are just:
Moment
- 64-bit integer, representing a time with high precision
EventType
- 32-bit integer, reference to another object
Value
- 64-bit floating point number
Requirements:
1) The pair of (Moment
+ EventType
) is unique identifier of the Point
, so I suspect it to be a composite primary key for a table
2) There's a huge number of Points
. Up to 5 billions (1-2 TB of disk space). So the format must be as small as possible.
3) Typical and almost single usage of the table is a retrieval (or creating a view) of millions of Points
by exact EventType
and a range of Moments
.
Questions:
Which RDBMS to choose and why?
What is the optimal sql definition for a table of Points
?
And comments about my thoughts below are also appreciated
My research:
I'm a complete newbie in the field of RDBMS, but I've heard a lot about SQLite. I don't need a huge professional system with all tools, features and extensions like PostgreSQL or MSSQL. Also I don't feel I need a server instead of simple "embedded" database file, so the choise of SQLite looks optimal. Another great RDBMS with a feature of embedded database is Firebird, but I was seduced by SQLite's dynamic typing paradigm. It looks like it can save me space on disk, because integer fields can be stored in "smaller" form (1, 2, 3, 4, 6 bytes).
But shortly problems appeared.
First of all, SQLite creates special ROWID
column (64-bits length) when primary key is composite:
CREATE TABLE points (
moment integer not null,
event_id integer not null,
value numeric not null,
PRIMARY KEY (moment, event_id)
);
It means table wastes nearly 40% much space for nothing.
I found nice article about "The WITHOUT ROWID Optimization". But it will be avaible only in 3.8.2 version of SQLite (December 2013). Waiting for ADO.NET provider which I need is inappropriate.
Another problem is that SQLite uses B-tree for tables. I'm not sure, but it looks like it is inefficient for selecting data ranges. My main task is to select a big block of Points
based on primary key's range, so it looks like SQLite will be bad choise.
Future research seems too hard for me (at least for today). Looking forward for a help of experienced guys.