Question

I want to know if I can use human readable primary keys for a relatively small number of database objects, which will describe large metropolitan areas.

For example, using "washington_dc" as the pk for the Washington, DC metro area, or "nyc" for the New York City one.

Tons of objects will be foreign keyed to these metro area objects, and I'd like to be able to tell where a person or business is located just by looking at their database record.

I'm just worried because my gut tells me this might be a serious crime against good practices.

So, am I "allowed" to do this kind of thing?

Thanks!

Was it helpful?

Solution

It all depends on the application - natural primary keys make a good deal of sense on the surface, since they are human readable and don't require any joins when displaying data to end users.

However, natural primary keys tend to be larger than INT (or even BIGINT) suragate primary keys and there are very few domains where there isn't some danger of having a natural primary key change. To take your example, a city changing its name is not a terribly uncommon occurrence. When a city's name changes you are then left with either an update that needs to touch every instance of city as a foreign key or with a primary key that no longer reflects reality ("The data shows Leningrad, but it really is St. Petersburg.")

So in sum, natural primary keys:

  1. Take up more disc space (most of the time)
  2. Are more susceptible to change (in the majority of cases)
  3. Are more human readable (as long as they don't change)

Whether #1 and #2 are sufficiently counteracted by #3 depends on what you are building and what its use is.

OTHER TIPS

I think that this question

What are the design criteria for primary keys?

gives a really good overview of the tradeoffs you might be making. I think the answer given is the correct one, but its brevity belies some significant thinking you actually have to do to work out what's right for you.

(From that answer) The criteria for consideration of a primary key are:

  • Uniqueness
  • Irreducibility (no subset of the key uniquely identifies a row in the table)
  • Simplicity (so that relational representation & manipulation can be simpler)
  • Stability (should not be altered frequently)
  • Familiarity (meaningful to the user)

For what it's worth, the small number of times I've had problems with scaling by choosing strings as the primary key is about the same as the number of time's I've had problems with redundant data using an autoincrement key. The problems that arise with autoincrement keys are worse, in my opinion, because you don't usually see them as soon.

A primary key must be unique and immutable, a human-readable string can be used as a PK so long as it meets both of those requirements.

In the example you've given, it sounds fine, given that cities don't change their names (and in the rare event they do then you can change the PK value with enough effort).

One of the main reasons you'd use numeric PKs instead of strings is performance (the other being to take advantage of automatically-incrementing IDs, see IDENTITY). If you anticipate more than a hundred queries per second on your textual PK then I would move to use int or bigint as a PK type. When you reach that level of database size and complexity you tend to stop using SSMS to edit table data directly and use your own tools, which would presumably perform a JOIN so you'd get the city name in the same resultset as the city's numeric PK.

you are allowed.

it is generally not the best practice.

numeric - auto incrementing keys are preferred. they are easily maintained and allow for coding of input forms and other interfaces where the user does not have to think up a new string as a key...

imagine: should it be washington, or washington_dc or dc or washingtondc.. etc.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top