Is it acceptable to have a database for direct queries and a second database for SSAS if they use the same data?

StackOverflow https://stackoverflow.com/questions/20804191

Question

I am working on a server that will be used by many, many users across a large corporation and I have two databases storing the same information in different ways.

Databases:

  • DB1 = Simple relational for direct queries
  • DB2 = Star schema with facts and dimensions for the cubes

End Users:

  • Analysts - create reports(crystal)/query data directly (using db 1)
  • Management - use analysis cubes to look at data (using db 2)

Both are created/updated daily through two steps:

  1. The production source data is imported and transformed into understandable, "common sense" relational tables stored in DB1.

  2. This step transforms the DB1 tables into fact/dimension tables and stores them in DB2. The SSAS cubes are then created based on these tables and stored here.

Question

Based on that description, is this acceptable from a design standpoint? Would it be better to ditch the "simple" database and then use views to assemble the data in "simple tables" for analysts to query?

Was it helpful?

Solution 2

It's never a good idea to have just the data warehouse (your "DB 2"). When you denormalize data into a data warehouse or cube for reporting, you are "throwing away" information. Summarizing away details means that the detailed information is lost.

Let's say that you suffer damage or loss of your cube/data warehouse. Or alternatively, let's say that new types of analysis are needed which require you to recalculate your cubes. If you've thrown away your raw, detailed data then you won't be able to recreate your summarized data.

For this reason, and perhaps because some kinds of queries may require or be more efficient against your transactional details, it's a good idea to load your raw data into a relational data store and then build your summarized cubes/star schemas out of that.

Note that whenever you have two data stores, you have the potential for the results to get out of sync. You need to have rules and processes in place to handle these situations. One data store should always be the "book of record". Typically this will be the most detailed information (your DB 1). You also want to have detective controls in place that look for discrepancies and recalculate your summarized data whenever the two data stores become out of sync.

OTHER TIPS

Your typical data mart will include a staging area (either a different schema or a different database) where the raw data is brought basically unaltered. This is essentially what you have with your DB1. Then your data load process will transform it into your facts and dimensions. This is your DB 2.

Star schemas are not just for cubes. They are great for SQL queries as well. You shouldn't be worried about "throwing away information". You only "lose" the data you don't include in your schema. Star schemas are optimized for reading/querying rather than writing and updating. Typically, your star schema is easier to understand for analysts. Also, their queries will typically include fewer joins with a star schema. I would bet that queries (written efficiently) against the denormalized star schema would return the data the analysts need quicker than the query to get that information from the normalized database most of the time. You can perform a test to prove this out. If analysts' queries are pulling mostly summarized data with few details, you can make summarized views in your fact tables that are at a higher level of granularity if you feel they are too slow, or you can give them access to your cube. If they pull a lot of details, the detailed fact tables at their original level of granularity should be fine. You can also help query times by using indexes effectively and tuning for the most frequently run queries.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top