How to store a (record which holds a) reference to any other column (attribute) in another table (relation)

https://dba.stackexchange.com/questions/273530

06-03-2021
|

Вопрос

TL;DR: If the database schema should hold all the business logic, how is it be possible to specify that an attribute type is a reference to a specific attribute, instead of a specific record (as is the case of a foreign key)?

To make an example, let’s suppose I have a table "Discounts" with a column "share" which holds the percentage to be applied to the value of column "cost", "price" or "shipping" of the table "Items".

"Discounts" also holds a foreign key to "item_id".

I need to add another column "base" to table "Discounts" where to store a reference to one of the column of table "Items", and calculate the percentage of the value of that column.
For example, given these values:

Discounts
share    base                 item_id
-------------------------------------
50       (item's cost)        3
25       (item's price)       1
100      (item's shipping)    2


Items
id    cost    price    shipping
-------------------------------
1     10      40       20
2     55      60       30
3     50      85       10

I want to be able to calculate:

50% of 50 (cost of item 3)
25% of 40 (price of item 1)
100% of 30 (shipping of item 2)

The column "base" should contain neither the number (e.g. 3) nor the name (e.g. "price") of the referenced column, because the name or the order of each table could change. In particular a database doesn't have any knowledge about the columns (attributes) order or the rows (records/tuplets) order, infact the RDB theory asserts that «the tuples of a relation have no specific order and that the tuples, in turn, impose no order on the attributes.»

Instead if we rely on the column names, we should enforce that each entry holds a valid attribute name, and whenever the attribute name changes, then we must change its records, constraints and app's validations. If the name is referred in multiple relations, maintaining the database integrity becomes very complex.

The problem here is that we are not writing a reference to the attribute name in the database schema (like when we add a foreign key), but into the data themselves, and this seems a very bad practice, since it threatens the referential integrity.

If there is no DB agnostic way to do this, then assume the database is PostgreSQL (v12+).

Решение

I think your model incorrectly represents your use case. The item value element (Cost, Price, or Shipping) seems to be an entity in its own right. My model (without knowing the bigger picture) would probably looks something like this:

Items
------------------------
Id      | int      | PK
Name    | string   | 
... other attributes

ItemValueElements
------------------------
Item ID | int      | PK, FK
Type    | enum     | PK       -- one of: 'cost', 'price', 'shipping'
Value   | currency |

-- (Item ID, Type) would be the primary key here.

Discounts
------------------------
Item ID | int      | FK
Type    | enum     | FK
Share   | decimal  |

In reality discounts do not normally apply to individual items, but to some classes of items, identified by different means, and your actual model would change accordingly.

If for you it's not the case, and you do want to have discount per item, you can simply conflate ItemValueElements and Discounts into one entity.

Другие советы

As a #database-design question (as you emphasize), you are going in wrong direction. The proper way to go in relational world is that you design your database so that column names don't change. Luckily, PostgreSQL has many noSQL features you can use.

Also, if number of attributes is not fixed, don't implement them as columns in one table, add one level of abstraction. This might be done by adding one or two more tables or using a JSONB to hold your attributes.

Any way you go, this will add to complexity to your design, making it more difficult to form ad hoq -queries and to implement it without errors. Please think one more time this is what you really need. It surely can be done, if you just can't go without it but it will take some time and more advance planning than you can give in this question.

If you decide to make it, make use of FUNCTIONs to calculate the discounts and make VIEWs so you don't have to type those long queries too often. If attribute names must change, map them to something fixed in a separate table.

I'm not sure what base is supposed to contain, but I'll assume that it would be ok to store C for cost, P for price and S for shipping?

select id
     , cost - cost_rebate as cost
     , price - price_rebate as price
     , shipping - shipping_rebate as shipping
from (
    select i.id
         , case when d.base = 'C' 
                then d.share / 100.0 
                else 0 
           end * cost as cost_rebate
         , case when d.base = 'P' 
                then d.share / 100.0 
                else 0 
           end * price as price_rebate
         , case when d.base = 'S' 
                then d.share / 100.0 
                else 0 
           end * shipping as shipping_rebate
         , cost
         , price
         , shipping
    from items i
    join discounts d
        on i.id = d.item_id
) as tmp

(Possible solutions, which I don't like by the way)

USE A COLUMN FOR DATA TYPE

Following the suggestion by @a_horse_with_no_name to use a column type to describe the referenced column, a first attempt could be to model cost, price and shipping using two columns each, one holding a column type and the other holding the actual value.

We can design the database like this:

Discounts
share    base_type    item_id
------------------------------------------
50       C            3
25       P            1
100      S            2


Items
id    type_1    value_1    type_2    value_2    type_3    value_3
-----------------------------------------------------------------
1     C         10         P         40         S         20
2     C         55         P         60         S         30
3     C         50         P         85         S         10

And then enforce the values of type_1, type_2, type_3 with a constraint, either in the application logic or in the business logic (ADD CONSTRAINT type_1 CHECK (type_1 IN ('C', 'P', 'S') ecc…)

The problem with this approach is that each type column still needs to know which is the associated value column (and viceversa): if value_1 is renamed "val_1" we would have the same drawbacks as using a column each.

USE A HASH STRUCTURE TO STORE THE VALUES ALONG WITH THEIR TYPES

A better approach could be to store cost, price and shipping in a single column holding a hash structure:

hstore(ARRAY['C','10'], ARRAY['P','40'],ARRAY['S','20'])
hstore(ARRAY['C','55'], ARRAY['P','60'],ARRAY['S','30'])
hstore(ARRAY['C','50'], ARRAY['P','85'],ARRAY['S','10'])

Resulting in

Items
id    amounts
-----------------------------------
1     "C"=>"10","P"=>"40","S"=>"20"
2     "C"=>"55","P"=>"60","S"=>"30"
3     "C"=>"50","P"=>"85","S"=>"10"

In this way we could add data types, name and rename them as wish (either in the application or in another table with TYPE, NAME), and the application could raise an exception to handle the removal of any data type.

The main drawback with this approach is the performance penalty.

Лицензировано под: CC-BY-SA с атрибуция

Не связан с dba.stackexchange