Question

I have a time-series simulation model which has more than 10 input variables. The number of distinct simulation instances would be more than 1 million, and each simulation instance generates a few output rows every day.

To save the simulation result in a relational database, i designed tables like this.

create table SimulationModel (

simul_id integer primary key,

input0 string/numeric,

input1 string/numeric,

...)

create table SimulationOutput (

dt DateTime primary key,

simul_id integer primary key,

output0 numeric,

...)

To make each model be unique, i'm considering some ways like below.

  1. put an unique constraint on all of the input columns of the SimulationModel table

  2. redesign SimulationModel table to group input columns into 2~3 columns and put an unique constraint on these grouped columns. (By making a long string value like "input0_input1_input2")

  3. forget about the server side constraint and do the job in the application side.

How do you think about these options?

Is there no problem to use an unique index on more than 10 columns with millions rows?

Is there some other way to recommend?

(I may use postgresql)

Was it helpful?

Solution

Create the unqiue constraint. 10 columns with a couple of million rows isn't really a large table. Also your tests will presumably benefit from having an index to work with - or at least an index wouldn't do any harm.

OTHER TIPS

If you create two tables for your simulationmodel like this:

Table SimulationModel {
id: integer,
...
}

Table SimulationModelInput {
simulationmodel_id: integer foreign key(SimulationModel.id),
input: string or numeric,
...
}

then you can put a single constrain on (simulationmodel_id, input) but the query to fetch the configuration is more complex.

The unique index is the way to go to make sure each set of inputs is unique.

You could create a single "unique" column that is a hash of the other columns and call that the identity.

I would handle the uniqueness constraint in the application, especially if only one process is creating simulations. You could then have one or more non-unique indexes on commonly-filtered input columns for efficient access.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top