What are the consequences, positive or negative, to having a surrogate primary key for a table which already has a guaranteed unique column? [closed]

https://dba.stackexchange.com/questions/161630

05-10-2020
|

Вопрос

I recently made a table that looks like the one below

CREATE TABLE example(
  id SERIAL PRIMARY KEY,
  subindusty_id INT UNIQUE NOT NULL,
  name TEXT NOT NULL
  -- other fields...
)

I realized that in this table I could just remove the column called id and make subindustry_id the primary key.

Thinking back, the only reason I can think of for why I didn't make subindustry_id the primary key is for the sake of consistency, considering that I have a lot of other tables with a column called id.

Except for things like consistency*, are there any tangible benefits/downsides to having this additional id column as the primary key when there's already another column that'd be a good primary key?

*Consistency as in "that's the way the rest of the tables look". The kind where you're enforcing a certain type of style. I guess the point of this question is to question the practice of giving every single table an id column regardless of the fact that they might already have a more suitable candidate for a primary key.

Решение

I agree a lot with comments from @MDCCL:

You should analyze each particular case —that is, every particular table within its whole context, that is, the entire database structure— to define if it requires the appending of a system-generated surrogate column, as such an artifact is an additional aspect that requires its particular administration. Consistency with the way the rest of the tables look is irrelevant, from the logical and physical points of view

A few pointers from my side (the ones that come to my mind at the moment, and that probably are not all there is to it)

Negative consequences:

You need more space for everything: each row is wider, and you have an extra index to maintain. Every write will take (a little bit) more time, and so will every read, because you use more space to read and write the same real information. Let's say you have a log of the readings of several thermometers, one per second. You have one table thermometers (thermometer_id PK, ...etc...) and one readings (thermometer_id, reading_timestamp, value_read). Your readings table has a natural key: (thermometer_id, reading_timestamp), because you cannot have two different readings at the same time. I would normally make this my PK, and forget about surrogates.
Using surrogate keys tend to obscure the real structure and meaning of your data. For instance, imagine you have a tables persons, books and books_liked_by_persons. This last table is a m x n table, and most of the times contains only (person_id, book_id) pairs. Adding a third id is in such cases most probably meaningless and noisy.

Positive consequences:

In some very specific cases, you may want to have different tables with share one part of their structure (let's say, for instance, you want some of them to have always a country_code, and xxx_id) so that some application is easier to develop because the same pattern can be used to process different tables. [This tends to be rather uncommon, AFAIK, but is also possible.]
Using a surrogate key, even if there are some other alternative keys might help you save space (and probably, time). Imagine your natural key is a PGP key (note 1). It takes a char(800) (approx.) to store it. Imagine you have several tables that reference this one. It will save (a lot of) space all around to have a surrogate integer key (8 or 16 bytes) on the original table, because the referencing tables will store 8 (or 16) bytes instead of 800. The accounting needs to be done for each specific case.
If you ever have to edit your data "by hand" (and, some day, you probably will), you don't want to look for keys which are really long. You prefer surrogates in that case.
Some databases (not PostgreSQL, at least as of now) will cluster the data of your tables using your primary key as a clustering index (or can do it by default unless you explicitly state the contrary). In such cases, it is normally advisable to have an always incrementing primary key; to avoid fragmentation.

So: every case needs analyzing, weighting and deciding. Sometimes consistency may be confused by flying on autopilot.

Side note: There are several different naming conventions. I prefer the one where id is never used, and a persons table has a person_id column if needed.

Note 1: A PGP key looks more or less like:

mQENBFiBGbcBCADBAe1p70ZPGwM6gmAAFTd13gv68rOUQEqlhddjkJZoiKs8SJuX m4Hpu29+4msQ91071R2yGZ+iydO2laRqWD1jFmF1qOTWiwYkrxYc288/XLbGi7NY gz9CPKlaH8N4VJBPC6eoRJgpUawBygERf+YFxBEXipWrZbusFrnHCilGGuOwIDAc UmYf7XNFZrg4+i+QEkBqWnBtd4Q7tkdTTywFDMZ3H+fwrDp+8M3zUwi10PCxJbQd JkKzXDFemaOHiDxWKNRg7CFyCBRq5K6mfXZuPeAtVOxFMFRnvyJ75HWv4Ss5n40b tpL/kmSQ9cHgl0nWHS+LMF55zZsAKToNaW6FABEBAAG0FXNvbWV0aGluZ0Bub3do ZXJlLmNvbYkBHAQQAQIABgUCWIEZtwAKCRDigzJ94dxj1hujCACFt22wZD5yn5/J 3cGLonszqgDRMNzmkdlJjiZi+2fC51z2mkQcy3rlzrQX9K1Mu3TLxPfwyMMiZkut trleZI4KyAD1hQRm4hQ4f+xx++xOqnMh5c7KsE6AwnGxeDgLy8hD3IJCmrDFwZ93 4XgZ+iXEzFxMcwmLeJ7xg7re4NTlovpIj1QetXqfEHydHQ1AJCqBjGSs805DBHGO nKAwBXuPI67Kcry+HIhn4ZaIXmXLGvXStJAz2bX8o/nQaVoC/YOyUTMZadR3jtdf ka5OeyvudCiKTNL3I1oDbhFFfGEhF2w66wp8acJ4z5bYkaQ5RFpEMI1ISfSn7Ypg FZln1sTo =RYbd

Лицензировано под: CC-BY-SA с атрибуция

Не связан с dba.stackexchange