Implementing a degenerate dimension with an alphanumeric varchar(30) at the transaction level?
-
07-02-2021 - |
题
I am currently working on designing my first data model and I am confused about this particular implementation of degenerate dimensions. According to Kimball's design tip 46, a degenerate dimension that's unwieldy (alphanumeric) can be implemented using a surrogate key. I have a two column degenerate key requirement, char(6) and varchar(30) on a fact table that has a grain of one row per transaction line. Accordingly, the DD would consume a large amount of space when compared to a simple integer; however, there is no additional context to add here. Would it be best to leave it as is and afford it the extra space, or is it worth putting this into a separate dimension while knowing that it would grow proportionally to the fact table (roughly 25%, as there's an average of 4 lines per transaction).
解决方案
Kimball's guidance touching on performance and space use needs to be taken with a grain of salt when using modern, columnar storage. And for SQL Server data warehouses, fact tables should typically be stored as Clustered Columnstores. Low-cardinality attributes in columnar storage are much cheaper than in uncompressed row stores.
And wide and high-cardinality attributes in columnstores do have a large impact on storage size, but not on query performance, as columns are only fetched and scanned when needed. You don't have to fetch, cache, and read the whole row.