Question

Profile of the foreign keys

FK      Distinct Values      %
----    ---------------  ------
Id1     1                 0.1%
,Id2    4                 0.3%
,Id3    5                 0.3%
,Id4    6                 0.4%
,Id5    6                 0.4%
,Id6    95                6.1%
,Id7    97                6.2%
,Id8    1423             90.7%

All foreign keys already make up the clustered Primary Key. This fact table is part of a star schema that includes 6 dimensions (Id's 6,7, and 8 reference the same date dimension).

Fact table currently has approx 1800 rows (incredibly small), and is expected to grow by that amount each month.

Should each foreign key have its own non-clustered non-unique single column index for facilitating joins? If so, why?

Each foreign key will be part of a clustered index (primary key) in its dimension table.

If indexes should be put on the foreign keys, then what should the fill factor and padding index be set to given the low cardinality of the columns?

Was it helpful?

Solution

Your profile doesn't really make sense with the "%" column - why are you finding the "percentage" of distinct values across fields? You need stats on the distribution of the distinct values - are 99% of the keys on Id8 the same ? are they evenly distributed? etc.

Note that everything I'm saying here applies to larger tables. With 1800 rows / month, indexes are probably a waste of space and time for you to worry about.

@jrara's "rule" about indexing all the dims is an easy rule to apply, but you can easily make mistakes if that's all you do. I don't want to use an oracle bitmap index on my 100mil row customer dimension, for example.

Indexing depends on what the queries look like against your data. Indexes won't help if you are doing a full scan of the fact table to perform aggregation and grouping for "summary" reports. They will help when a user is trying to filter on an attribute of a dimension, and that filter results in you only having to look up a small percentage of the records from the fact table. Is there a main entry point to your table? Do people typically filter on an attribute of the "Id8" dimension, then want grouping on an attribute from the other dimensions?

Essentially the answers to your questions are:

Should each foreign key have its own non-clustered non-unique single column index for facilitating joins?

In general, yes, so long as the dimension tables are small and the dim keys are relatively evenly distributed in the fact table. Usually it is worse to use a index access to get 99% of the fact table rows.

what should the fill factor and padding index be set to given the low cardinality of the columns?

Lowering the FILLFACTOR below 100% will cause slower index reads, since there are more (empty) pages in the index for the DB to read. Since a data warehouse is designed for fast selects, I don't really ever recommend that you adjust the fillfactor down.

That being said, in a few cases adjusting your FILLFACTOR may make sense. If the the fact table is very large (hundreds of GB / TB), and index rebuilds take hours, and you might only rebuild indexes once a month or even less. In these cases you need to figure out how much data (as a percentage) that you'll be adding to the table each day, and set the fillfactor accordingly.

OTHER TIPS

First of all, I think that you should not create a clustered primary key based on foreign keys. A clustered index is organizes the data on disk and it is better if it is

  • narrow
  • numeric
  • increasing (strictly monotonic)

So I think it is better to create e.g. a unique constraint on foreign keys to make row unique. Or create a non clustered primary key on those columns and then create a clustered index (but not primary key) on e.g. date foreign key (YYYYMMDD).

Usually foreign keys are indexed (non clustered, non unique) on Fact table to make faster searches. But some people does not enforce cardinality at all on dimensional model (ETL takes care of the referential integrity) because primary key - foreign key constraints makes ETL loads slow.

From Vincent Rainardi

  1. Question: How do you index a fact table? And explain why. {H}

Answer: Index all the dim key columns, individually, non clustered (SQL Server) or bitmap (Oracle). The dim key columns are used to join to the dimension tables, so if they are indexed the join will be faster. An exceptional candidate will suggest 3 additional things: a) index the fact key separately, b) consider creating a covering index in the right order on the combination of dim keys, and c) if the fact table is partitioned the partitioning key must be included in all indexes.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top