Spreading/distributing an entity into multiple tables instead of a single on

https://stackoverflow.com/questions/10187849

01-06-2021
|

Domanda

Why would anyone distribute an entity (for example user) into multiple tables by doing something like:

user(user_id, username)
user_tel(user_id, tel_no)
user_addr(user_id, addr)
user_details(user_id, details)

Is there any speed-up bonus you get from this DB design? It's highly counter-intuitive, because it would seem that performing chained joins to retrieve data sounds immeasurably worse than using select projection..

Of course, if one performs other queries by making use only of the user_id and username, that's a speed-up, but is it worth it? So, where is the real advantage and what could be a compatible working scenario that's fit for such a DB design strategy?

LATER EDIT: in the details of this post, please assume a complete, unique entity, whose attributes do not vary in quantity (e.g. a car has only one color, not two, a user has only one username/social sec number/matriculation number/home address/email/etc.. that is, we're not dealing with a one to many relation, but with a 1-to-1, completely consistent description of an entity. In the example above, this is just the case where a single table has been "split" into as many tables as non-primary key columns it had.

Soluzione

By splitting the user in this way you have exactly 1 row in user per user, which links to 0-n rows each in user_tel, user_details, user_addr

This in turn means that these can be considered optional, and/or each user may have more than one telephone number linked to them. All in all it's a more adaptable solution than hardcoding it so that users always have up to 1 address, up to 1 telephone number.

The alternative method is to have i.e. user.telephone1 user.telephone2 etc., however this methodology goes against 3NF ( http://en.wikipedia.org/wiki/Third_normal_form ) - essentially you are introducing a lot of columns to store the same piece of information

edit

Based on the additional edit from OP, assuming that each user will have precisely 0 or 1 of each tel, address, details, and NEVER any more, then storing those pieces of information in separate tables is overkill. It would be more sensible to store within a single user table with columns user_id, username, tel_no, addr, details.

If memory serves this is perfectly fine within 3NF though. You stated this is not about normal form, however if each piece of data is considered directly related to that specific user then it is fine to have it within the table.

If you later expanded the table to have telephone1, telephone2 (for example) then that would violate 1NF. If you have duplicate fields (i.e. multiple users share an address, which is entirely plausible), then that violates 2NF which in turn violates 3NF

This point about violating 2NF may well be why someone has done this.

Altri suggerimenti

The author of this design perhaps thought that storing NULLs could be achieved more efficiently in the "sparse" structure like this, than it would "in-line" in the single table. The idea was probably to store rows such as (1 , "john", NULL, NULL, NULL) just as (1 , "john") in the user table and no rows at all in other tables. For this to work, NULLs must greatly outnumber non-NULLs (and must be "mixed" in just the right way), otherwise this design quickly becomes more expensive.

Also, this could be somewhat beneficial if you'll constantly SELECT single columns. By splitting columns into separate tables, you are making them "narrower" from the storage perspective and lower the I/O in this specific case (but not in general).

The problems of this design, in my opinion, far outweigh these benefits.

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow