Question

I have heard a few references that pk is not required on fact table. I believe every single table should have a pk.

How could a person understand a row in a fact table if there is no pk and 10+ foreign keys.

Was it helpful?

Solution

Primary Key is there

... but Enforcing the primary key constraint in database level is not required.

If you think about this, technically a unique key or primary key is a key that uniquely defines the characteristics of each row. And it can be composed of more than one attributes of that entity. Now in the case of a Fact table, foreign keys flowing-in from the other dimension tables together already act as a compounded primary key. And these foreign-key combinations can uniquely identify each record in the fact table. So, this foreign key combination is the primary key for the fact table.

Why not a Surrogate Key then?

Now if you wanted, you could have defined one surrogate key for the fact table. But what purpose would that serve? You are never going to retrieve one record from that fact table referring its surrogate key (use Indexes instead). Neither you are going to use that surrogate key to join the fact with other tables. Such a surrogate key will be completely waste of space in the database.

Enforcing Database Constraints

When you define this conceptual primary key in the database level, database needs to ensure that this constraint is not getting violated in any of the DML operation performed over it. Ensuring this constraint is a overhead for your database. It might be insignificant for an OLTP system, but for a large OLAP system where data are loaded in batch, this may incur significant performance penalties. Beside, why do you want your database to ensure the integrity of the constraints when you can ensure the same during the data loading phase itself (typically through your ETL coding).

OTHER TIPS

You are absolutely right that in principle a fact table should have a key. From the point of view of data modelling it is required. In implementation, key constraints in the database usually require an index however. The overhead of creating and maintaining indexes is such that the uniqueness of the "key" attributes is sometimes maintained by controls the integration layer ("ETL process") rather than by a constraint in the database.

Whenever practical it does make sense to create the key constraint within the database. If the key isn't explicitly defined in the database then it ought to be clearly documented for users so that they can understand what the data means.

As you can read in other answes, primary key constraint is not required, a fact table surrogate key may be helpful at the physical level.

Here a Kimball design tip for Fact Table Surrogate Key:

There are a few circumstances when assigning a surrogate key to the rows in a fact table is beneficial:

  1. Sometimes the business rules of the organization legitimately allow multiple identical rows to exist for a fact table. Normally as a designer, you try to avoid this at all costs by searching the source system for some kind of transaction time stamp to make the rows unique. But occasionally you are forced to accept this undesirable input. In these situations it will be necessary to create a surrogate key for the fact table to allow the identical rows to be loaded.

  2. Certain ETL techniques for updating fact rows are only feasible if a surrogate key is assigned to the fact rows. Specifically, one technique for loading updates to fact rows is to insert the rows to be updated as new rows, then to delete the original rows as a second step as a single transaction. The advantages of this technique from an ETL perspective are improved load performance, improved recovery capability and improved audit capabilities. The surrogate key for the fact table rows is required as multiple identical primary keys will often exist for the old and new versions of the updated fact rows between the time of the insert of the updated row and the delete of the old row.

  3. A similar ETL requirement is to determine exactly where a load job was suspended, either to resume loading or back put the job entirely. A sequentially assigned surrogate key makes this task straightforward.

(source: Design Tip #81 Fact Table Surrogate Key)

As we have foreign keys in the Fact table, Which are coming from the primary keys of other dimensions having unique value in each row to identify each record of the fact table so this way foreign keys are itself acting as primary keys.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top