Question

I was looking for advice on how to store data in a database for historical data mining purposes. If I can get the state of an entity at a given time, what is the best way to store it so that I can historically mine that data, predicting what the state is likely to be based on how it has been in the past?

For a more concrete example, I can get the up/down state of a server, and that server's current load. I can get this on a periodic schedule. I'd like to store this data such that I could easily query for the up/down state or the load for a specific timeframe, or get the entire history of that server. I don't have much experience, if any, in database design.

Was it helpful?

Solution

This is why Ralph Kimball (and others) invented the Data Warehouse.

You have star schema with dimensions like Server and Time. You have a fact table that records state changes (Up and Down) and a fact table that records Load at a given point in time.

Step 1. Find a good tutorial on star schema design. If necessary buy a book or two. It's worth every moment you spend learning how to do star schema data modeling.

Step 2. Build a prototype schema as cheaply and quickly as you can. Get data loaded so you can write some queries and see how things work. Make mistakes. Fix them.

Step 3. After you get something to work, then write down a good design.

Step 4. Build your "for real" data mart.

Step 5. Build your "production" loads.

Step 6. Query.

OTHER TIPS

This is why there is a new version of the standard, SQL:2011.

Just add the appropriate start- and end-date attributes to your entities, and/or the appropriate columns to your tables, and basically you're done.

The new SQL will do quite a bit (but not all alas) of the otherwise very pesky work for you, if you have an engine that supports the new features, of course.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top