質問

I am designing a database and am wondering which approach should I use. I am going to describe the database I intend to design and the possible approaches that I can use to store the data in the tables.

Please recommend which approach I should use and why?

About the data:

A) I have seven attributes that need to be taken care of. These are just examples and not the actual ones I intend to store. Let me call them:

1)Name

2)DOB (Modified..I had earlier put in age here..)

3)Gender

4)Marital Status

5)Salary

6)Mother Tongue

7)Father's Name

B) There will be a minimum of 10000 rows in the table and they can go up from there in the long term

C) The number of attributes can change over the period of time. That is, new attributes can be added to the existing dataset. No attributes will ever be removed.

Approach 1

Create a table with 7 attributes and store the data as it is. Added new columns if and when new attributed need to be added.

  • Pro: Easier to read the data and information is well organized

  • Con: There can be a lot of null values in certain rows for certain attributes for which values are unknown.

Approach 2

Create a table with 3 attributes. Let them be called :

1) Attr_Name : Stores the attribute name . eg name,age,gender ..etc

2) Attr_Value :Stores value for the above attribute, eg : Tom, 25, Male

3) Unique ID : Uniquely identifies the Name, Value pair in the database. eg. SSN

So, in approach 2, in case new attributes need to be added for certain rows, we can just add them to the hashmap we have created without worrying about null values.

  • Pro: Hashmap structure. Eliminates nulls.

  • Con: Data is not easy to read. Information cannot be easily grasped.

C) The Question

Which is the better approach.?

I feel that approach 1 is the better approach. Because its not too tough to handle null values and data is well organized and its easy to grasp this king of data. Please suggest which approach I should use and why?

Thanks!

役に立ちましたか?

解決 2

Your second option is one of teh worst design mistakes you can make. This should only be done when you have hundreds of attributes that change constantly and are in no way the same from object to object (such as medical lab tests). If you need to do that, then do not under any circumstances use a relational database to do it. NOSQL database handle EAV designs better by far than relational ones.

Another problem with design 2 is that it becomes almost impossible to have good data integrity as you cannot correctly enforce FKs and data types and add contraints to the data. Since this stuff shoudl never be designed to happen only in the application since things other than the application often affect the data, this factor alone is enough to make your second idea foolish and foolhardy.

The first design will perform better in general. It will be easier to write queries and it will force you to think about what needs to change when you add an attribute (this is a plus not a minus) instead of having to design to always show all attributes whether you need them or not. If you would have a lot of nulls, then add a related table rather than more columns(you can have one-to-one related tables). Usually in this case you might have something that you know only a subset of the records will have and they often fall into groupings by subject fairly naturally. For instance you might have general people related attributes (name, phone, email, address) that belong in one table. Then you might have student-related attributes that belong in a separate table and teacher-related attributes that belong in a third table. Or you might have things you need for all insurance policies and separate tables for vehicle insurance, health insurance, House insurance and life insurance.

There is a third design possibility. If you have a set of attributes you know up front then put them in one table and have an EAV table only for attributes that cannot be determined at design time. This is the common pattern when the application wants to have the flexibility for the user to add customer specific data fields.

他のヒント

This is a typical narrow table (attribute based) vs. wide table discussion. The problem with approach #2 is that you are probably going to have to pivot the data, to get it into a form the user can work with (back into a wide view format). This can be very resource intensive as the number of rows grows, and as the number of attributes grows. It's also hard to look at the table, in raw table view, and see what's going on.

We have had this discussion many times at our company. We have some tables that lend themselves very well to an attribute type schema. We've always decided against it because of the necessity to pivot the data and the inability to view the data and have it make sense (but this is the lessor of the two problems for us - we just don't want to pivot millions of rows of data).

BTW, I wouldn't store age as a number. I would store the birth date, if you have it. Also, I don't know what 'Mother Tongue' refers to, but, if it's the language the mother speaks, I would store this as a FK to a master language table. It's more efficient and lessens the problem of bad data because of a misspelled language.

I don't think anyone can really determine which one is better immediately, but here are a couple of things to think about:

  1. Do you have sample data? If yes then see if there will be a lot of nulls, if there are not then just go with option 1
  2. Do you have a good sense on how the attributes will grow? For instance looking at the attributes you listed above, you may not know all of them, but they all do exist - so in theory you could fill the table. If you will have a lot of sparse data then #2 may work
  3. When you do get new types of data can you group it into another table and use a foreign key? For instance if you want to capture the address you could always have an address table that references your initial table
  4. What type of queries do you plan on using? It's much harder to query a key-value table than a "normal one" (not super hard, just harder - if you're comfortable using implied joins and the like to normalize the data then it's probably not a big deal).

Overall I'd be really careful before you implemented #2 - I've done it for certain specialized cases (metrics gathering where I have dozens of different metrics and don't really want to maintain dozens of different tables) but in general it's more trouble than it's worth.

For something like this I'd just create one table, and either add columns as you go along, or just create new tables for new data structures if necessary.

ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top