Question

I am new to SSIS in data warehouse. I am using Microsoft business intelligence studio.

I have 5 Dimensions each having some PK. I have a Fact table that contains all the PK of Dimensions, means their is a foreign key relationship exist ( as in star schema).

Now what is the best practice to load the fact table.

What i have done is write a cross join query between 5 Dimensions and the resultant set is dumped to the fact table. But i don't think this is a good practice.

I am completely new to MS SSIS. so plz describe suggestions in detail.

thanks

Was it helpful?

Solution

I would echo @Damir's points about Project Real and Kimball. I am a fan of both.

I guess to give you some more thoughts, to answer your question,

  • load your date dimension and other "static" dimensions as a one off load
  • load records into all your dimensions to take care of NULL and UNKNOWN values
  • load your dimensions. For your dimensions, decide on a column by column basis what you want as type 1 or type 2 changing dimension columns. Be cautious and choose them mostly as type 1 unless there is a good reason.
  • [edited] load your fact table by joining your staging transaction data which will go into a fact table to your new dimension tables using the business keys, thus looking up the dimension's foreign keys as you go. e.g. sales transactions will have a store number (the business key), which you would want to look up in DimStore (already loaded in the previous step), which would give you the kStore of DimStore, then you would record kstore against that transaction in FactSalesTransaction.

Other general things you should consider (not related to your question, but if yo uare stating out you should consider)

  • Data archiving. How long will you keep data online? / when will it be deleted?
  • Table partitioning. If you have very large Fact tables(s), you should consider partitioning on a date or subject area basis. Date is quite nice, as you can do some interesting things with regard to dropping old partitions when the data is too old as part of the standard load process.
  • Having the DWH as a snowflaked schema, then using a set of views to flatten the snoflake into a star. This is particularly useful when putting an OLAP cube on top of a SQL DWH, as it simplifies the cube design.
  • How are you going to manage different environments (Dev/Test/etc/Prod)? Using one of the SQL Server configuration styles is imperative.
  • Build a template SSIS package with all the variables you need and the configration/connection strings you want. It will save loads of time to do that now, rather than having to rework packages when you discover new things. Do trivial prototypes initially to prove your methodology!

OTHER TIPS

Take a look at Microsoft Project Real examples. Also get a Kimball book and read-up on loading fact tables -- the topic covers several chapters.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top