Question

I am currently working on designing my first data model and I am confused about this particular implementation of degenerate dimensions. According to Kimball's design tip 46, a degenerate dimension that's unwieldy (alphanumeric) can be implemented using a surrogate key. I have a two column degenerate key requirement, char(6) and varchar(30) on a fact table that has a grain of one row per transaction line. Accordingly, the DD would consume a large amount of space when compared to a simple integer; however, there is no additional context to add here. Would it be best to leave it as is and afford it the extra space, or is it worth putting this into a separate dimension while knowing that it would grow proportionally to the fact table (roughly 25%, as there's an average of 4 lines per transaction).

Was it helpful?

Solution

Kimball's guidance touching on performance and space use needs to be taken with a grain of salt when using modern, columnar storage. And for SQL Server data warehouses, fact tables should typically be stored as Clustered Columnstores. Low-cardinality attributes in columnar storage are much cheaper than in uncompressed row stores.

And wide and high-cardinality attributes in columnstores do have a large impact on storage size, but not on query performance, as columns are only fetched and scanned when needed. You don't have to fetch, cache, and read the whole row.

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top