Question

I am confused in some questions. I need their answers. If our relational model is also De-normalize then why we prefer dimensional model ? What is the reason we prefer dimensional model over relational model ? Your historical data can also stored in OLTP and you can perform reporting easily on any OLTP then why we use dimensional model and data warehouse ? What is the difference between a dimension and a de-normalized table ?

Thanks in advance

Était-ce utile?

La solution

Short answer:

If your lookups / retrievals from your OLTP tables are fast enough, and your specific search requirements do not have such complications as are described below, then there should not be a need to get into any dimensional star-schemas.

Long answer:

Dimensional and Denormalized models have different purposes. Dimensional models are generally used for data warehousing scenarios, and are particularly useful where super-fast query results are required for computed numbers such as "quarterly sales by region" or "by salesperson". Data is stored in the Dimensional model after pre-calculating these numbers, and updated as per some fixed schedule.

But even without a data warehouse involved, a Dimensional model could be useful, and its purpose could complement that of the Denormalized model, as in the following example:

A Dimensional model enables fast search. Joins between the dimension tables and the fact table are set up in a star-schema. Searching for John Smith would be simplified because we'll search for John OR Smith only in the relevant dimension table, and fetch the corresponding person ids from the fact table (fact table FKs point to dimension table PKs), thereby getting all persons with either of the 2 keywords in their name. (A further enhancement would enable us to search for all persons having variations of "John Smith" in their names e.g. John, Jon, Johnny, Jonathan, Smith, Psmith, Smythe by building snowflake dimensions.)

A Denormalized model, on the other hand, enables fast retrieval, such as returning back a lot of columns about a specific item without having to join multiple tables together.

So in the above scenario, we would first use the Dimensional model to get a set of IDs for the persons of our interest, and then use the Denormalized table to get full details of those selected IDs without having to do any further joins.

This kind of a search would be very slow if we directly query the Denormalized tables, because a text search will need to be done on the PersonName column. It becomes even slower if we try to include the name variations, or if we need to add more search criteria.

Excellent reference:

An excellent reference for learning about the vast (and very interesting) topic of Dimensional Modeling is Ralph Kimball's The Data Warehouse Lifecycle Toolkit. Its companion volume The Data Warehouse Toolkit covers a large number of actual use cases.

Hope this helps!

Autres conseils

A dimensional model uses denormalisation as one of its techniques in order to optimise the database for: - query performance, and - user understanding.

OLTP systems are typically hard to report from and also slow, being optimised as they are for OLTP (insert, update, delete) performance and also to protect transactional integrity. A data warehouse, using a dimensional model, still uses relational techniques but is instead optimised to consider the experience of getting the data out over getting the data in.

Truth is, you can't always report easily from any OLTP system: the tables are often obscurely titled without considering people are going to want to get at the data to make business decisions. Reporting tools that generate SQL also struggle to make performant queries on your typical normalised schema.

Modern advances in OLTP technologies provide alternatives to dimensional models that address performance issues, but still do not tackle the typical steps made in creating a dimensional model, to make the database tables easier to comprehend and navigate.

A dimension is a table that is intended to represent a business concept or entity, giving context to a particular measurement of a business process (or 'fact'). Dimensions are typically denormalised in a dimensional model both to reduce the number of tables to comprehend/navigate but also to reduce the number of joins for performance reasons. For example, a Product dimension may contact Brand information whereas in an OLTP model these would be separate tables, which allows users to filter a Fact by Brand directly without traversing multiple tables.

I agree with @Rich, mainly the fact that dimensional model uses denormalized tables. I had started following Kimball's book as @Krishna indicates, about 2 year ago. I think you will get answers to all your questions/doubts if you read this book. Please note, if you aim for some kind of BI solution, then per my opinion, follow dimensional modelling. This is for reporting ease, beiing true and closer to business process. You can perhaps also make report direct from OLTP system, but your reporting solution may not survive test of user's ever changing demands. Dimension modelling is done while remaining close to natural business process. At the same time, it remains so flexible that any other add-on process can be done easily like setting up piece of puzzle when you are closer to solve it.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top