Database design for ETL - Surrogate vs natural key

https://stackoverflow.com/questions/21578644

07-10-2022
|

Question

We are currently in the process to redesign our ETL database.

So far we had used the following design with natural keys: CustomerID, OrderID and SystemType

The OrderID can be repeated for different customers, this is why SystemType key helps us to create a unique index. Our joins are complicated as we always need to join on three keys.

We would like to use a surrogate key but when another extract is coming into the system we cannot identify the rows as our surrogate key is not included in the customer's extract.

Should we use the three columns as primary keys or should we concatenating them into one column and use that as primary key? I understand an autoincrement key is not an option.

Would it be possible that you share your thoughts on the preferred key design for a system like this?

Thanks,

Mathias

La solution

In ETL scenarios it's usual to have both. You need the natural key to identify new from updated rows and you must maintain its uniqueness as you load data. Then assign a surrogate key to any new rows if you need it. Foreign keys in other tables can reference either the surrogate or the natural key, whichever you prefer. In ETL scenarios if the natural key attributes already exist as foreign key references in other tables then the cost of cascading surrogate keys through the schema can be much more expensive than just leaving the natural key values as they are.

Licencié sous: CC-BY-SA avec attribution

Non affilié à StackOverflow