Strategies to store extra information about models without too many column names (alternatives to DB normalization and model subclassing)

StackOverflow https://stackoverflow.com/questions/22124518

문제

Say you had a Model called Forest. Each object represents a forest on your continent. There is a set of data that is common to all these forests, like forest type, area etc., and these can be easily represented by columns on the SQL table, forest.

However, imagine that these forests had additional data about them that might not always be repeatable. For example the 20 coniferous forests have a pine-fir split ratio number, whereas the deciduous forests have a autumn-duration number. One way would be to store all these columns on the main table itself, but there will be too many columns on each row, with many columns remaining un-filled by definition.

The most obvious way around this is to make sub-classes of the Forest model and have separate table for each subclass. I feel that's a heavy handed approach that I would rather not follow. If I need some data about the generic forest I'll have to consult another table.

Is there a pattern to solve this problem? What solution do you usually prefer?

NOTE: I have seen the other questions about this. The solutions proposed were:

  • Subtyping, same as I proposed above.
  • Have all the columns on the same table.
  • Have separate tables for each kind of forest, with duplicated data like area and rainfall... duplicated.

Is there an inventive solution that I don't know of?

UPDATE: I have run into the EAV model, and also a modified version where the unpredictable fields are stored out in a NoSQL/JSON store, and the id for that is held in the RDB. I like both, but welcome suggestions in this direction.

올바른 솔루션이 없습니다

다른 팁

On the database side, the best approach is often to store attributes common to all forests in one table, and to store unique attributes in other tables. Build updatable views for clients to use.

create table forests (
  forest_id integer primary key,
  -- Assumes forest names are not unique on a continent.
  forest_name varchar(45) not null,
  forest_type char(1) not null 
    check (forest_type in ('c', 'd')),
  area_sq_km integer not null
    check (area_sq_km > 0),
  -- Other columns common to all forests go here.
  --
  -- This constraint lets foreign keys target the pair
  -- of columns, guaranteeing that a row in each subtype
  -- table references a row here having the same subtype.
  unique (forest_id, forest_type)
);

create table coniferous_forests_subtype (
  forest_id integer primary key,
  forest_type char(1) not null
    default 'c'
    check (forest_type = 'c'),
  pine_fir_ratio float not null
    check (pine_fir_ratio >= 0),
  foreign key (forest_id, forest_type)
    references forests (forest_id, forest_type)
);

create table deciduous_forests_subtype (
  forest_id integer primary key,
  forest_type char(1) not null
    default 'd'
    check (forest_type = 'd'),
  autumn_duration_days integer not null
    check (autumn_duration_days between 20 and 100),
  foreign key (forest_id, forest_type)
    references forests (forest_id, forest_type)
);

Clients usually use updatable views, one for each subtype, instead of using the base tables. (You can revoke privileges on the base subtype tables to guarantee this.) You might want to omit the "forest_type" column.

create view coniferous_forests as 
select t1.forest_id, t1.forest_type, t1.area_sq_km,
       t2.pine_fir_ratio
from forests t1
inner join coniferous_forests_subtype t2
        on t1.forest_id = t2.forest_id;

create view deciduous_forests as 
select t1.forest_id, t1.forest_type, t1.area_sq_km,
       t2.autumn_duration_days
from forests t1
inner join deciduous_forests_subtype t2
        on t1.forest_id = t2.forest_id;

What you have to do to make these views updatable varies a little with the dbms, but expect to write some triggers (not shown). You'll need triggers to handle all the DML actions--insert, update, and delete.

If you need to report only on columns that appear in "forests", then just query the table "forests".

Well, the easiest way is putting all the columns into one table and then having a "type" field to decide which columns to use. This works for smaller tables, but for more complicated cases it can lead to a big messy table and issues with database constraints (such as NULLs).

My preferred method would be something like this:

A generic "Forests" table with:  id, type, [generic_columns, ...]
"Coniferous_Forests" table with: id, forest_id (FK to Forests), ...

So, in order to get all the data for a Coniferous Forest with id of 1, you'd have a query like so:

SELECT * FROM Coniferous_Forests INNER JOIN Forests 
ON Coniferous_Forests.forest_id = Forests.id
AND Coniferous_Forests.id = 1

As for inventive solutions, there is such a thing as an OODBMS (Object Oriented Database Management Sytem).

The most popular alternative to Relational SQL databases are Document-Oriented NoSQL databases like MongoDB. This is comparable to using JSON objects to store your data, and allows you to be more flexible with your database fields.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top