Metadata reporting database - what type of DB schema should I use? [closed]

https://stackoverflow.com/questions/21825077

12-10-2022
|

Question

I've been tasked with a project to collect server configuration metadata from Windows servers and storing it in a DB for the purpose of reporting. I will be collecting data for over 100 configuration fields for each server.

One of the tasks the client wants to be able to do is compare config data for either the same server at different points in time, or two different servers which have the same function (i.e. Exchange servers). To see if there are any differences and what those differences may be.

As for DB design, I would normally just normalize all of the data into a OLTP type schema, where all of the similar config items would be persisted to a table relating to their specific area (e.g. Hardware info). But I'm thinking this may be a bad move and I should be looking to save this to some kind of OLAP type data warehouse.

I'm just not sure which way to go with the DB design, so could do with some direction on this. Should I go with normalizing the data and creating lots of tables, or one massive table with no normalisation and over 100 fields, or should I look into a star topology or something completely different (EAV)?

I am limited to using .Net and MSSQL server 2005.

Edit: The tool to collect and store the data will be run on an as required basis, rather than just grabbing the config data every day/week. Would be looking to keep the data for a couple of years at least.

Solution

Star Schema is best for reporting purposes in my experience. It is not necessary to use Star Schema for storage because it might be a set of views (indexed for performance) and you can design views for Star Schema later. Storage model should be a set of event tables to record configuration changes. You can start from flat log file structure and normalize it iteratively to find good structures for storage and queries. Storage model is supposed to be good if you can define model constraints, reporting model should be good for fast ad-hoc queries. You should focus on storage model because reporting model is a denormalization of storage model and it is easier to denormalize later. EAV structures are useless for both models because you can not define any constraints but queries are complex anyways.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow