Question

Should I always have a primary key in my database tables?

Let's take the SO tagging. You can see the tag in any revision, its likely to be in a tag_rev table with the postID and revision number. Would I need a PK for that?

Also since it is in a rev table and not currently use the tags should be a blob of tagIDs instead of multiple entries of multiple post_id tagid pair?

Was it helpful?

Solution

You should strive to have a primary key in any non-trivial table where you're likely to want to access (or update or delete) individual records by that key. Primary keys can consist of multiple columns, and formally speaking, will be the shortest available superkey; that is, the shortest available group of columns which, together, uniquely identify any row.

I don't know what the Stack Overflow database schema looks like (and from some of the things I've read on Jeff's blog, I don't want to), but in the situation you describe, it's entirely possible there is a primary key across the post identifier, revision number and tag value; certainly, that would be the shortest (and only) superkey available.

With regards to your second point, while it may be reasonable to argue in favour of aggregating values in archive tables, it does go against the principle that each row/column intersection in a table ought to contain one single value. While it may slightly simplify development, there is no reason you can't keep to a normalised table with versioned metadata, even for something as trivial as tags.

OTHER TIPS

A table should have a primary key so that you could identify each row uniquely with it.

Technically, you can have tables without a primary key, but you'll be breaking good database design rules.

I tend to agree that most tables should have a primary key. I can only think of two times where it doesn't make sense to do it.

  1. If you have a table that relates keys to other keys. For example, to relate a user_id to an answer_id, that table wouldn't need a primary key.
  2. A logging table, whose only real purpose is to create an audit trail.

Basically, if you are writing a table that may ever need to be referenced in a foreign key relationship then a primary key is important, and if you can't be positive it won't be, then just add the PK. :)

See this related question about whether an integer primary key is required. One of the answers uses tagging as an example:

Are there any good reasons to have a database table without an integer primary key

For more discussion of tagging and keys, see this question:

Id for tags in tag systems

From MySQL 5.5 Reference Manual section 13.1.17:

If you do not have a PRIMARY KEY and an application asks for the PRIMARY KEY in your tables, MySQL returns the first UNIQUE index that has no NULL columns as the PRIMARY KEY.

So, technically, the answer is no. However, as others have stated, in most cases it is quite useful.

I firmly believe every table should have a way to uniquely identify a record. For 99% of the tables, this is a primary key. For the rest you may get away with a unique index (I'm thinking one column look up type tables here). Any time I have a had to work with a table without a way to uniquely identify records, there has been trouble.

I also believe if you are using surrogate keys as your PK, you should, where at all possible, have a separate unique index on whatever combination of fields make up the natural key. I realize there are all too many times when you don't have a true natural key (names are not unique or what makes something unique might be spread across several parentchild tables), but if you do have one, please please please make sure it has a unique index or is created as the PK.

If there is no PK, how will you update or delete a single row ? It would be impossible ! To be honest I have used a few times tables without PK, for instance to store activity logs, but even in this case it is advisable to have one because the timestamps could not be granular enough. Temporary tables is another example. But according to relational theory the PK is mandatory.

it is good to have keys and relationships . Helps a lot. however if your app is good enough to handle the relationships then you could possibly skip the keys ( although i recommend that you have them )

Since I use Subsonic, I always create a primary key for all of my tables. Many DB Abstraction libraries require a primary key to work.

Note: that doesn't answer the "Grand Unified Theory" tone of your question, but I'm just saying that in practice, sometimes you MUST make a primary key for every table.

If it's a join table then I wouldn't say that you need a primary key. Suppose, for example, that you have tables PERSONS, SICKPEOPLE, and ILLNESSES. The ILLNESSES table has things like flu, cold, etc., each with a primary key. PERSONS has the usual stuff about people, each also with a primary key. The SICKPEOPLE table only has people in it who are sick, and it has two columns, PERSONID and ILLNESSID, foreign keys back to their respective tables, and no primary key. The PERSONS and ILLNESSES tables contain entities and entities get primary keys. The entries in the SICKPEOPLE table aren't entities and don't get primary keys.

Databases don't have keys, per se, but their constituent tables might. I assume you mean that, but just in case...

Anyway, tables with a large number of rows should absolutely have primary keys; tables with only a few rows don't need them, necessarily, though they don't hurt. It depends upon the usage and the size of the table. Purists will put primary keys in every table. This is not wrong; and neither is omitting PKs in small tables.

Edited to add a link to my blog entry on this question, in which I discuss a case in which database administration staff did not consider it necessary to include a primary key in a particular table. I think this illustrates my point adequately.

Cyberherbalist's Blog Post on Primary Keys

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top