Pregunta

Let's say we are to normalize a database into 3rd normal form using the requirement:

I need a movie ticket registry program that can remember customers and the tickets that they've purchased.

We might end up with a database like this:

ticket
    id
    movie_name
    price

customer
    id
    first_name

However, when I look at this, for some reason it looks redundant. What if I were to break it up into even smaller pieces, like this:

name
    id
    name

customer
    id
    fk_name_id

ticket
    id
    fk_name_id
    price

Would this be a good approach? Is there a name for this approach?

¿Fue útil?

Solución

As Jordan says, the point of breaking data out into a separate table is to avoid redundant data.

As you apparently realize, we do NOT want to lay out our tables like this:

WRONG!!!

ticket
  customer_name
  movie_name

That would mean that the customer_name is repeated for every movie he watches, and the movie name is repeated for every person who watches that movie. Lots and lots of redundant names. If the user has to type them in every time, it's likely that sometimes he mis-spells a name or uses a variation on a name, like we find our table includes "Star Wars", "Star Wars IV", "Star Wars Episode IV", and "Stra Wars", all for the same movie. All sorts of problems.

By breaking the customer and the movie out into separate tables, we eliminate all the redundancy. Great. Celebrate.

But if we take your suggestion of making a "name" table that holds both customer names and movie names, did we eliminate any redundancy?

If a customer has the same name as a movie -- if we happen to have a customer named "Anna Karenina" or "John Carter" or whatever (or maybe someone named their kid "Batman Returns" for that matter) -- are you going to use the same record to store both? If no, then you have not saved any redundancy. You have just forced us to do an extra join every time we read the tables.

If you do use the same record, it's even worse. What if you create a record for customer "Anna Karenina" and you share the id/name record with the movie. Then Anna gets married and now her name is "Anna Smith". If you update the name record, you have not only changed the name of the customer, but also the title of the movie! This would be a very bad thing.

You could, of course, say that if you change the name, that instead of updating in place you create a new record for the new name. But then that defeats half the purpose of breaking the names out to a separate table. Suppose when we originally created the movie record we mistyped the name as "Anna Karina". Now someone points out our mistake and we fix it. But with the "make a new record every time there's a change" logic, we'd have to fix each ticket sale one by one.

I guess you could ask the user if this is a change for just the movie title, just the customer name, or both. But now we've added another level of complexity. And for what? Our program is more complex, our queries are more complex, and our user interface is more complex. In exchange, we get a tiny gain in saving disk space for the rare case where a customer coincidentally has the same name as a movie title.

Not worth it.

Otros consejos

Your first approach is not correct. If you think about the problem, there are three entities:

  • Movie
  • Customer
  • Ticket

The connection between Movie and Customer is really the Ticket table, so this is an example of an association or junction table that has additional information.

I wouldn't think of the problem as "there is an entity 'name' and customers and movies both have names". The name is an attribute of other entities, it is not its own entity (at least in this case).

Jay's answer is excellent, and should be chosen as the correct one IMHO.

However I wanted to add: normalization does not mean "storing like data in a separate structure". That is absolutely not the intent of normalization, and this is a mistake made by a lot of inexperienced database modelers, especially when they have a programming (OOP) background.

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top