Question

I try to build a DW based on operating database. In operating database, I have some tables to describe location information, they are normalized tables. Like below:

MM_CITY
{
    CITY_ID;
    CITY_NAME;
}

MM_DISTRICT
{
    CITY_ID;
    DISTRICT_ID;
    DISTRICT_NAME;
}

MM_REGION
{
    DISTRICT_ID;
    REGION_ID;
    REGION_NAME;
}

FACT_TABLE
{
    REGION_ID; 
    COST;
}

And I want to build a region dimension and connect it with fact table, like below:

REGION_DIMENSION
{
    REGION_ID;
    REGION_NAME;
    DISTRICT_NAME;
    CITY_NAME
}

I could do that with SQL join, but consider there are other dimensions, it's difficult to transfer data in original database into new DW just by writting SQL.

Is there any ETL tool (like Kettle) to finish data transfer when table structure changes? How to do that? Any reference material will be great appreciated.

Thanks in advance.


Comments:

it's my own confusion here, actually if REGION_DIMESNION has CITY_ID, DISTRICT_ID and REGION_ID, it doesn't need to do more ID naming. Original ID system is enough to use in DW.

Was it helpful?

Solution 3

Hope this is what you want, a dimension with Geography details.

 DIM_GEOGRAPHY
    {
    PK,
    CITY_ID,
    CITY_NAME,
    DISTRICT_ID,
    DISTRICT_NAME,
    REGION_ID,
    REGION_NAME
    }

    FACT_TABLE
    {
        PRIMARY_KEY,
        CITY_ID; 
        COST;
    }

Also you can query the same structure like this,

SELECT 
DIM.DISTRICT_NAME AS 'District_Name', 
SUM(F.COST) AS 'Total_Cost'    

FROM

FACT F 
INNER JOIN DIM_GEOGRAPHY DIM 
ON F.CITY_ID = DIM.CITY_ID

GROUP BY DIM.DISTRICT_NAME

-- WHERE DIM.REGION_NAME = 'XYZ'

Here you will get District wise Cost total for particular region, specified with where clause.

OTHER TIPS

Its a bit complex process to explain all here. First you need to understand how a data warehouse is designed. Then have to use ETL tools like SSIS for designing the data warehouse. You will get lot of tutorials on SSIS which is a Microsoft product for doing ETL.

So I suggest you to go with SSIS ETL tool for your first ETL. Later you may go with widely used ETL tools like Informatica.

I am providing some links here. Please refer those.

  1. Create first Data Warehouse
  2. SSIS1
  3. SSIS2
  4. SSIS3 Tutorial
  5. SSIS4 Tutorial

These are general links, where you can pickup logic & implement in your scenario.

Good Luck.

Aditya's advice is correct. Unless you are managing a hugely complex ETL process, it would be better to isolate your table changes outside of the ETL process and then just update your package accordingly.

You can managing schema changes and even automate the creation of new packages / tables with languages such as biml. This might be worth doing if you are managing 100's of table changes each year, but for a small number of changes, the effort will far outweigh the benefit

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top