Question

I need to solve an exercise for a pharmaceutical company, they ask me to create a database structure from scratch given a few informations.

Because I want it to be scalable and "future proof" I decided to follow the star/snowflake structure. For now it looks more star than snowflake but the idea is there. I have a particular question about the main Fact Table (Study) and how to represent the Dimension Table Status.

I came up with 2 options:

  • OPTION 1: Fact Table Study > Foreign Key > Dimension Table Contract > Foreign Keys > Dimension Table Status.

So we pass through two Dimension Tables in order to represent the Status on the Fact Table. I will do that through a JOIN in a view or whatever...

enter image description here

  • OPTION 2: Fact Table Study > Foreign Key > Dimension Table Status.

This way we link directly the Fact Table (Study) to the Dimension Table Status.

enter image description here

Should I go for Option 1 or Option 2?

I'm afraid to link too many Dimension Tables to the Fact Table. This is the first time I try to create a database structure.

Any advice is welcome, any suggestion is welcome, especially from seasoned database architects but also from amateurs.

Thank you

EDIT: adding a few more info

Thank you @AntC for your question. In fact this is not at all a Data Warehouse scenario, the scenario is that of a pharmaceutical company that needs a new software to track their clinical trial.

But of course the most known schemas are Star/Snowflake and I don't want to use any hierarchical schema. At the same time I want to avoid shapes like triangle, diamond, circle because even if now there are only 100 users who knows what this database will be in 10 years. The idea is to shape something for the long term and in my opinion the Star/Snowflake shape is choose also for normal application as far as I know.

Was it helpful?

Solution

In Option 2 the Status_Code appears as non-key in two tables: Study, Contract. But which of those entities does it belong to? And what are the relative cardinalities of those entities?

Looking at Option 1, the Status_Code belongs to the Contract; there can be multiple Studys for a given Contract.

So if you were to adopt option 2, you give OLTP a headache: anytime there's a change in Status of a Contract, transaction processing needs to replicate that change to all Studys, with an obvious risk of getting out of synch. That's called an update anomaly.

Denormalising the model like Option 2 is legitimate in data warehouses, because we want fast reporting. It is unwise in a transaction processing-oriented data model, because of the processing cost and risks of maintaining data in synch.

So for an application focussing on transaction processing, Option 1 is indicated. And yes, you would create a Join view Study > Contract > Status.

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top