Question

Situation

I'm working on making an inventory history for an application that tracks over 200,000 serialized devices. The goal is to be able to look back in time and know exactly where a device was on X day.

I recognize that the ideal situation would likely be to fire a trigger to update the history when an item changes immediately, however that would be a very large scale task and one that would not be implemented easily with the current application.

With that in mind, we have decided to run a nightly script that checks the current inventory and stores that into an inventory_history table that tracks the location, status, etc of the inventory. Originally, we tried just plopping in the history every single day. IE inserting 200,000 rows each day which results in over a million records every 5 days. We found that this would result in GBs of data in less than a year. My proposed solution is to implement it in a version control style history. So instead of inserting 200,000 records every day, only insert ones that have changed. (And insert deleted records for ones that were removed.)

Question(s)

  • Is there any obvious problems to this approach? Is there a better alternative approach to this for an already built application not designed for history?
  • If this approach is a good one, what needs to be implemented that I might be missing? Currently I have the following scenarios implemented:
    • Insert, if none exist with the exact same values already.
    • Insert delete record if device was not found on that day.
    • When Selecting, group by device id using the most recent date allowed by the history search. (If we want to know the inventory status on Jan 1st, 2014, don't select any records that happened after that, but group the records so that the ones that show up are the most recent.)

Notes

When we are looking at the history, we sometimes want to know a specific device, and other times we want an inventory summary report for that day.

Was it helpful?

Solution

You may want to create two history tables, the first quickly determines what changed from the last time the process was run and the second tracking the history of inventory changes.

The first table is a copy of your inventory table as it existed last time you ran this nightly (daily, hourly, etc.) process. (I would add a datetime field to track when the that process was last run). This table along with your real inventory table determine what has been INSERTED, DELETED or UPDATED by using the Serial Number and relevant fields (location, status, etc).

The second table is the inventory change log (meaning SERIAL XXX has multiple entries). Populate this table whenever inventory changes by copying the record found in the previous table to this table. Add another field: ACTION with values (INSERT, UPDATE, DELETE).

Pseudo code:

To populate inserts and updates

INSERT INTO inventory_transaction (serial_number,lastupdated,ACTION, location, status)
SELECT inventory.serial_number, NOW()
  , IF(inventory_history.serial_number IS NULL, 'INSERT', 'UPDATE')
  , inventory.location
  , inventory.status
FROM inventory
LEFT JOIN inventory_history
  ON inventory.serial_number = inventory_history.serial_number
WHERE NOT (inventory.location  <=> inventory_history.location
           AND inventory.status <=> inventory_history.status);

Deletes:

INSERT INTO inventory_transaction (serial_number, lastupdated, ACTION)
SELECT inventory_history.serial_number, NOW(), 'DELETE'
FROM inventory_history
LEFT JOIN inventory
  ON inventory.serial_number = inventory_history.serial_number
WHERE inventory.serial_number IS NULL;

Populate the inventory inventory_history table:

TRUNCATE TABLE inventory_history;
INSERT INTO inventory_history(serial_number,lastchecked, location, status)
SELECT serial_number, NOW(), location, status
FROM inventory;

To see what the inventory looked like at a given point in time:

SELECT inventory_transaction.*
FROM (SELECT serial_number, MAX(lastupdated) as last_date
      FROM inventory_transaction
      WHERE lastupdated <= 'point in time'
      GROUP BY serial_number) AS correct_time
JOIN inventory_transaction
  ON inventory_transaction.serial_number = correct_time.serial_number
    AND inventory_transaction.lastupdated = correct_time.lastupdated
    AND inventory_transaction.ACTION <> 'DELETED'
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top