Вопрос

I have different kinds of measurements. They are not related to each other. Let’s say A, B, and C. All three of them have the same structure, ID (integer), value (float), experiment_id (integer) (relation to an experiments table).

I do not know the best way to store this information.

A) Is it better to use three tables (A, B, and C)?

B) Or is it better to store all of them in one table called measurements and add an additional column called measurement_type to store the information of A, B, or C (including indexes).

In my application I would like to have three Models called A, B, and C.

The solution should be fast, because for each measurement type, there might be hundreds of million, or even billion entries one day. Furthermore, one day there might be measurement type D, E, ..., Z.

By the way, I am using an Oracle Enterprise database.

Это было полезно?

Решение

Based on your comments, and assuming you focus is on query performance (as opposed to INSERT performance), looks like you need a model similar to this:

enter image description here

Use ORGANIZATION INDEX on MEASUREMENT table (also consider using COMPRESS clause, since there will be many rows sharing the same leading EXPERIMENT_ID).

The index I1 consist from: {FEATURE_ID, EXPERIMENT_ID, MEASUREMENT_TYPE, VALUE}, in that order. Consider using COMPRESS clause, since there will be many rows sharing the same leading FEATURE_ID).

This gives us 2 B-Trees:

  1. The B-Tree "underneath" the PK, i.e. the index-organized table itself.
  2. The B-Tree "underneath" the index I1.

A query on EXPERIMENT_ID can be satisfied by a single index range scan in the PK B-Tree and no table heap access (heap doesn't exist). The PK B-Tree naturally stores the rows belonging to the same experiment physically close together, so I/O is minimized.

A query on FEATURE_ID can also be satisfied by a single range scan (in the I1 B-Tree). The I1 is a covering index, so there is no need to do a double-lookup into the PK B-Tree. The I1 B-Tree naturally stores the rows belonging to the same feature physically close together, so I/O is minimized.

I'd shy away from horizontally partitioning the MEASUREMENT table on MEASUREMENT_TYPE, unless you have performed measurements on representative amounts of data and concluded it provides a performance tradeoff that better suits your needs.

Другие советы

Since the measurement types can grow and not restricted to A, B and C, it is recommended to use option B) as it supports additional measurement types when needed.

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top