database design (lists of many different items, with custom fields)

https://stackoverflow.com/questions/11205051

17-06-2021
|

Question

I’m working on a project where you work with all kinds of items. What it is is of no importance, it’s the database design I’m worried about. If someone could give me some insight in how I should create the layout of my database for this, or just point me in the right direction, I would be most thankful.

All kinds of items in one list
Imagine you have lists of items. You could have a list of CDs, a list of DVDs and a list of books. This translates to 1 list has many items in database terms, with the id of the list in the item row.
But what if you wanted to have a list with all Super Mario related stuff, containing soundtrack DVDs, that horrible live action film and some fanfiction novels based on the plumber’s life.
I suddenly realized, when drawing out my database that those items, that belong to the same list, couldn’t be in the same table, as they all would have different columns to support artist/album title, director/movie title, author/novel title, etc.. Wich I couldn’t possibly have all in one giant table.
On top of that, I want to have the track titles of the soundtrack albums and the actors of the film in my database. If I had only CDs, I could easily attach a album_track-table to my item-table, but I can’t just attach all kinds of different tables to my item-table, as that wouldn’t be too good for performance if I wanted to get all items with all their details for a certain list. The procedure would have to search all attached tables for references of the list, even if the list doesn’t contain any books, vinyls, manga, tv-series, plants, furniture, etc…

What I have right now is the following layout (but I can’t imagine this is the best way to do this):

t_list (id) --> t_item (id, id_list, image)

t_item --> t_cd (id, id_item, artist, title)
t_item --> t_dvd (id, id_item, director, title)
t_item --> …

t_cd --> t_cd_track (id, id_cd, track_title, length)
t_dvd --> t_dvd_actors (id, id_dvd, actor_name, image)
…

Custom columns
Now, imagine that to add these items to a cd list, you’d have a form with input fields, according to the columns in the table t_cd (artist, album title, genre, …). I want to be able to add a custom input field for example for the average price of albums.
This is set for a certain user for a certain list. This is not set on an item level, because that would mean it would be added to everyone’s form. I just want to add that field to my own CD list.
But, it still needs to related to items, because that value needs to be filled in in the database.

I’m thinking about something like this:

t_list (id) --> t_extra_field (id, description, id_list)
t_extra_field --> t_field_value (id, id_extra_field, value)

But I’m not entirely sure where to attach this in my database scheme.

Could this kind of structure also be an answer to my previous question? (t_field --> t_field_value) If so, I also don’t know where to attach that. Perhaps to list, like I suggested in the above example?
That would mean that all details for a certain item, are in one table, but value by value, not on 1 single record, according to a category id of some sort, coming from another table, attached to item. That would be a table with a lot of records, which again raises my question : isn’t this bad for performance..?

I sincerely hope someone could give me some insight in the matter..

Solution

A completely generic database is probably a bad idea - it usually means you have to enforce the data consistency completely at the application level. This might be justified for highly "untyped" or "volatile" data when you want to avoid DDL at run-time, but the data you describe here looks "typed" enough for a more conventional database design.

Judging on your description, you'd need something similar to this:

enter image description here

The enter image description here symbol denotes the "category" (aka. inheritance, sub-type, generalization hierarchy etc.).

For the specific cases where we know exactly how the items should be connected, we can model that directly through a link (aka. junction) table between specific sub-types, as in case of the TRACK table.

Also, we can group items of different kinds through GROUP and GROUP_ITEM (so, say, a Mario soundtrack(s), movie(s) and book(s) can be grouped together, under the same GROUP_ID).

Artists are also handled in a fairly general way, so we can easily represent a situation where (for example) a same person writes both a song and a book.

As for things such as "average price of albums", ideally you shouldn't store them at all - you should calculate them when needed, based on the existing data, so the possibility of an out-of-date result is eliminated.

If this becomes problematic performance-wise, either:

do it periodically, cache the result and live with the somewhat out-of-date result.
or cache the result whenever the data is modified (through triggers), but do it very carefully to avoid anomalies in the concurrent environment.

For example...
```
SELECT AVG(PRICE) FROM TABLE1;
INSERT TABLE2 (AVERAGE_PRICE) VALUES (result_of_the_previous_query);
```
...is almost certainly unsafe, but depending on the DBMS even...
```
INSERT TABLE2 (AVERAGE_PRICE) VALUES (SELECT AVG(PRICE) FROM TABLE1);
```
...might not be completely safe without proper locking. You'll need to learn about your DBMS'es transaction isolation and locking.

In the specific case of calculating an average, there are other tricks that you might consider, such as separately incrementing/decrementing the COUNT and adding/subtracting SUM of the price through triggers with each INSERT/UPDATE/DELETE, and then calculating the AVG on the fly. SQL guarantees that things such as UPDATE MY_COUNT = MY_COUNT + 1 will be "atomic".

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow