Question

How do I populate Dim_tbls from a relational source?

These example tables are given:

tbl_sales:    id_sales, fk_id_customer, fk_id_product, country, timestamp   
tbl_customer: id_customer, name, adress, zip, city
tbl_product:  id_product, price, product

My goal is to get these attributes into a start-schema. The problem I have is the logic behind loading the dimension tables. I mean, what data would I load into the Dim_Product? All the products that are in tbl_product? But how would I know how many Sales are done with a specific product?

Analysis I would like to do are:

 How many people bought product x.
 How many sales are made from city x.
 How many sales were made between Time x and y. 

Example data:

 tbl_sales: id_sales | fk_id_customer | fk_id_product | country | timestamp 
                1    |       2        |      1        |   UK    | 19.11.2013 10:23:22
                2    |       1        |      2        |   FR    | 20.11.2013 06:04:22

 tbl_customer: id_customer | name | adress | zip | city
                      1    | Frank|Street X| 211 | London
                      2    | Steve|Street Y| 431 | Paris

 tbl_customer: id_product| Price | product
                      1  | 100,00| Hammer
                      2  |  50,00| Saw
Was it helpful?

Solution

Let's start with a very simple star schema model; for example, I assumed you don't need to worry about handling changes to dimensions' attributes.

factSales

  DateKey
  CustomerKey
  ProductKey
  Counter (=1; this is a factless fact table)

dimDate

  DateKey
  Date
  Year
  Quarter
  Month
  ...

dimCustomer

  CustomerKey
  Name
  Address
  Zip
  City

dimProduct

  ProductKey
  Name
  Price (if it changes, you need move it to factSales)

How many people bought product x.

SELECT DISTINCT CustomerKey
FROM factSales
WHERE ProductKey IN ( SELECT ProductKey
                      FROM dimProduct
                      WHERE Name = 'Product X' )

How many sales are made from city x.

SELECT SUM(Counter)
FROM factSales
WHERE CustomerKey IN ( SELECT CustomerKey
                       FROM dimCustomer
                       WHERE City = 'City X' )

How many sales were made between Time x and y.

SELECT SUM(Counter)
FROM factSales
WHERE DateKey IN ( SELECT DateKey
                   FROM dimDate
                   WHERE Date BETWEEN DateX AND DateY )
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top