Relational database design question - Surrogate-key or Natural-key?

https://stackoverflow.com/questions/3747730

04-10-2019
|

Question

Which one is the best practice and Why?

a) Type Table, Surrogate/Artificial Key

Foreign key is from user.type to type.id: alt text

b) Type Table, Natural Key

Foreign key is from user.type to type.typeName: alt text

Solution

I believe that in practice, using a natural key is rarely the best option. I would probably go for the surrogate key approach as in your first example.

The following are the main disadvantages of the natural key approach:

You might have an incorrect type name, or you may simply want to rename the type. To edit it, you would have to update all the tables that would be using it as a foreign key.
An index on an int field will be much more compact than one on a varchar field.
In some cases, it might be difficult to have a unique natural key, and this is necessary since it will be used as a primary key. This might not apply in your case.

OTHER TIPS

The first one is more future proof, because it allows you to change the string representing the type without updating the whole user table. In other words you use a surrogate key, an additional immutable identifier introduced for the sake of flexibility.

A good reason to use a surrogate key (instead of a natural key like name) is when the natural key isn't really a good choice in terms of uniqueness. In my lifetime i've known no fewer than 4 "Chris Smith"s. Person names are not unique.

I prefer to use the surrogate key. It is often people will identity and use the natural key which will be fine for a while, until they decide they want to change the value. Then problems start.

You should probably always use an ID number (that way if you change the type name, you don't need to update the user table) it also allows you to keep your datasize down, as a table full of INTs is much smaller than one full of 45 character varchars.

If typeName is a natural key, then it's probably the preferable option, because it won't require a join to get the value.

You should only really use a surrogate key (id) when the name is likely to change.

Surrogate key for me too, please.

The other might be easier when you need to bang out some code, but it will eventually be harder. Back in the day, my tech boss decided using an email addr as a primary key was a good idea. Needless to say, when people wanted to change their addresses it really sucked.

Use natural keys whenever they work. Names usually don't work. They are too mutable.

If you are inventing your own data, you might as well invent a syntheic key. If you are building a database of data provided by other people or their software, analyze the source data to see how they identify things that need identification.

If they are managing data at all well, they will have natural keys that work for the important stuff. For the unimportant stuff, suit yourself.

well i think surrgote key is helpful when you don't have any uniquely identified key whose value is related and meaningful as is to be its primary key... moreover surrgote key is easier to implement and less overhead to maintain.

but on the other hand surrgote key is sometimes make extra cost by joining tables. think about 'User' ... I have

UserId varchar(20), ID int, Name varchar(200)

as the table structure.

now consider that i want to take a track on many tables as who is inserting records... if i use Id as a primary key, then [1,2,3,4,5..] etc will be in foreign tables and whenever i need to know who is inserting data i've to join User Table with it because 1,2,3,4,5,6 is meaningless. but if i use UserId as a primary key which is uniquely identified then on other foreign tables [john, annie, nadia, linda123] etc will be saved which is sometimes easily distinguishable and meaningful . so i need not to join user table everytime when i do query.

but mind it, it takes some extra physical space as varchar is saved in foreign tables which takes extra bytes.. and ofcourse indexing has a significant performance issue where int performs better rather than varchar

Surrogate key is a substitution for the natural primary key. It is just a unique identifier or number for each row that can be used for the primary key to the table. The only requirement for a surrogate primary key is that it is unique for each row in the table.

It is useful because the natural primary key (i.e. Customer Number in Customer table) can change and this makes updates more difficult.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow