Вопрос

What are practices about using email addresses as the primary key? Should I avoid it and use an auto incremented ID number instead or is the engine able to handle it just as well?

MySQL database but i'm interested in how other engines might handle this (PostgreSQL specially).

Это было полезно?

Решение

You should always have a unique integer primary key that has no business value. This is then referred to as a surrogate key.

You should store the email address itself in another field, frequently with an index so it can act as a key for lookups.

This will enable you to provide functionality that is based on locating the user based on using the email address for lookup. Any other functionality at that point then uses that records primary key for other operations, e.g. updating the users address.

Другие советы

It would be perfectly reasonable to use an email address only where a fairly narrow set of criteria are met:

  • The email address is the primary entity, it doesn't identify something else (like a user account, say)
  • There are relatively few FK references to the table or fast FK lookups without joins are vital
  • You don't validate email addresses in any way

In other words it's very rarely appropriate to use an email address as a primary key. The only situation I can really think of where it'd be sensible is software that processes a mail stream, where it wants to record stats about each individual email address.

If you're thinking of using it as an identifier for users, don't do it.

The email address its self is the primary entity

You're not using an email address to identify something else, like a user account say, but instead have a table that's all about email addresses. Say, you're keeping track of how many messages went to/from each address. If you're identifying something else with an email address, don't use it as a primary key. Use a surrogate key if there's no perfectly stable small and simple natural key. Names and email addresses change.

There are relatively few foreign key references

There aren't too many FK references to the table that has the email address as primary key, or you require very fast and join-free lookups in the tables with the FK. You can gain a big performance win if you're searching a table directly for a value (the email), rather than joining on another table and testing the other table for the value. The flip side here is that using email addresses instead of generated surrogate keys adds to the storage needed for tables (thus: bigger, slower tables and indexes) so it's only worthwhile if you really expect to search on foreign keys a lot.

You don't validate email addresses

If you have such a concept as a "valid" or "invalid" email address your rules will change sooner or later, and you'll be in a miserable situation if you're using email addresses as primary keys.

Email addresses are weird

These three email addresses are the same:

user.name@DOMAIN.COM
user.name@DoMAIN.CoM
user.name@domain.com

but these three are all different:

user.name@domain.com
USER.NAME@domain.com
User.Name@domain.com

according to the relevant RFC. Some MTAs agree, others treat the whole thing case-insensitively.

Yeah. Don't use them as PKs.

Use an auto-incremented primary key. You do not need to expose this information to the user, you can represent it visually as if the key was the email address, but you need numbers that are internally consistent and do not change over time.

Remember your primary key is used for linking to other tables, so if someone changes their email address you would have to update all the dependent links as well. This is extremely difficult to get right.

It doesn't matter what SQL database you use, they all work roughly the same way and have similar limitations.

One important reason to NOT use business information as a primary key and instead use a surrogate primary key is because of foreign keys. Imagine that someone's email address needs to be updated. Can you imagine what a pain it would be to update all that information in ther keys? If your foreign keys are strict enough, you may wind up having to make a duplicate record, update all the child records, then delete the original record. That's much harder to pull of than simply updating 1 records email address if you use a surrogate primary key (auto generated integer usually)

Sensible criteria for choosing and designing keys are: Simplicity, Familiarity and Stability. Email addresses are simple and familiar to people who use them and they change relatively infrequently. Many successful websites and systems require unique email addresses to identify users. Email addresses make perfectly good keys for many purposes.

Given that an email address makes a suitable key, the question is should it be a primary key. The choice of a primary key is something that arises when a table has more than one key and you wish to choose one of them as "preferred" for some purpose. Ideas about what should or should not become a primary key are fundamentally subjective and arbitrary. There is no sound theoretical basis on which to make such a choice because a key designated as primary isn't required to be any different in form or function from any other key. On the basis of human preference alone an email address ought to make a "better" choice of primary key than an unfamiliar and irrelevant incrementing number.

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top