I'll went with PaperTrail, it keeps history of all my models, even their destruction. I could always switch to point 2 later on if it doesn't scale.
Keep historical database relations integrity when data changes
Question
I hesitate between various alternative when it comes to relations that have "historical"
value.
For example, let's say an User has bought an item at a certain date... if I just store this the classic way like:
transation_id: 1
user_id: 2
item_id: 3
created_at: 01/02/2010
Then obviously the user might change its name, the item might change its price, and 3 years later when I try to create a report of what happend I have false data.
I have two alternative:
keep it stupid like I shown earlier, but use something like https://github.com/airblade/paper_trail and do something like:
t = Transaction.find(1); u = t.user.version_at(t.created_at)
create a database like
transaction_users
andtransaction_items
and copy the users/items into these tables when a transaction is made. The structure would then become:transation_id: 1 transaction_user_id: 2 transaction_item_id: 3 created_at: 01/02/2010
Both approach have their merits, tho solution 1 looks much simpler... Do you see a problem with solution 1? How is this "historical data" problem usually solved? I have to solve this problem for 2-3 models like this for my project, what do you reckon would be the best solution?
Solution 2
OTHER TIPS
Taking the example of Item price, you could also:
- Store a copy of the price at the time in the transaction table
- Creating a temporal table for item prices
Storing a copy of the price in the transaction table:
TABLE Transaction(
user_id -- User buying the item
,trans_date -- Date of transaction
,item_no -- The item
,item_price -- A copy of Price from the Item table as-of trans_date
)
Getting the price as of the time of transaction is then simply:
select item_price
from transaction;
Creating a temporal table for item prices:
TABLE item (
item_no
,etcetera -- All other information about the item, such as name, color
,PRIMARY KEY(item_no)
)
TABLE item_price(
item_no
,from_date
,price
,PRIMARY KEY(item_no, from_date)
,FOREIGN KEY(item_no)
REFERENCES item(item_no)
)
The data in the second table would looke something like:
ITEM_NO FROM_DATE PRICE
======= ========== =====
A 2010-01-01 100
A 2011-01-01 90
A 2012-01-01 50
B 2013-03-01 60
Saying that from the first of January 2010 the price of Article A was 100. It changed the first of Januari 2011 to 90, and then again to 50 from the first of January 2012.
You will most likely add a TO_DATE to the table, even though it's a denormalization (the TO_DATE is the next FROM_DATE).
Finding the price as of the transaction would be something along the lines of:
select t.item_no
,t.trans_date
,p.item_price
from transaction t
join item_price p on(
t.item_no = p.item_no
and t.trans_date between p.from_date and p.to_date
);
ITEM_NO TRANS_DATE PRICE
======= ========== =====
A 2010-12-31 100
A 2011-01-01 90
A 2011-05-01 90
A 2012-01-01 50
A 2012-05-01 50