Question

I know the basic difference between a star schema and a snowflake schema-a snowflake schema breaks down dimension tables into multiple tables in order to normalize them, a star schema has only one "level" of dimension tables. But the Wikipedia article for Snowflake Schema says

"Some users may wish to submit queries to the database which, using conventional multidimensional reporting tools, cannot be expressed within a simple star schema. This is particularly common in data mining of customer databases, where a common requirement is to locate common factors between customers who bought products meeting complex criteria. Some snowflaking would typically be required to permit simple query tools to form such a query, especially if provision for these forms of query weren't anticipated when the data warehouse was first designed."

When would it be impossible to write a query in a star schema that could be written in a snowflake schema for the same underlying data? It seems like a star schema would always allow the same queries.

Was it helpful?

Solution

For data mining, you almost always have to prepare your data -- mostly as one "flat table".

It may be a query, prepared view or CSV export -- depends on the tool and your preference.

Now, to properly understand that article, one would probably have to smoke-drink the same thing as the author when he/she wrote it.

OTHER TIPS

As you mention, preparing a flat table for data mining starting from a relational database is no simple task, and the snowflake or the star schema only work up to a point.

However, there is a software called Dataconda that automatically creates a flat table from a DB.

Basically, you select a target table in a relational database, and dataconda "expands" it by adding thousands new attributes to it; these attributes are obtained by executing complex queries involving multiple tables.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top