Designing a database : Which is the better approach? [closed]

Question 1

Your second option is one of teh worst design mistakes you can make. This should only be done when you have hundreds of attributes that change constantly and are in no way the same from object to object (such as medical lab tests). If you need to do that, then do not under any circumstances use a relational database to do it. NOSQL database handle EAV designs better by far than relational ones.

Another problem with design 2 is that it becomes almost impossible to have good data integrity as you cannot correctly enforce FKs and data types and add contraints to the data. Since this stuff shoudl never be designed to happen only in the application since things other than the application often affect the data, this factor alone is enough to make your second idea foolish and foolhardy.

The first design will perform better in general. It will be easier to write queries and it will force you to think about what needs to change when you add an attribute (this is a plus not a minus) instead of having to design to always show all attributes whether you need them or not. If you would have a lot of nulls, then add a related table rather than more columns(you can have one-to-one related tables). Usually in this case you might have something that you know only a subset of the records will have and they often fall into groupings by subject fairly naturally. For instance you might have general people related attributes (name, phone, email, address) that belong in one table. Then you might have student-related attributes that belong in a separate table and teacher-related attributes that belong in a third table. Or you might have things you need for all insurance policies and separate tables for vehicle insurance, health insurance, House insurance and life insurance.

There is a third design possibility. If you have a set of attributes you know up front then put them in one table and have an EAV table only for attributes that cannot be determined at design time. This is the common pattern when the application wants to have the flexibility for the user to add customer specific data fields.

Question 2

This is a typical narrow table (attribute based) vs. wide table discussion. The problem with approach #2 is that you are probably going to have to pivot the data, to get it into a form the user can work with (back into a wide view format). This can be very resource intensive as the number of rows grows, and as the number of attributes grows. It's also hard to look at the table, in raw table view, and see what's going on.

We have had this discussion many times at our company. We have some tables that lend themselves very well to an attribute type schema. We've always decided against it because of the necessity to pivot the data and the inability to view the data and have it make sense (but this is the lessor of the two problems for us - we just don't want to pivot millions of rows of data).

BTW, I wouldn't store age as a number. I would store the birth date, if you have it. Also, I don't know what 'Mother Tongue' refers to, but, if it's the language the mother speaks, I would store this as a FK to a master language table. It's more efficient and lessens the problem of bad data because of a misspelled language.

Question 3

I don't think anyone can really determine which one is better immediately, but here are a couple of things to think about:

Do you have sample data? If yes then see if there will be a lot of nulls, if there are not then just go with option 1
Do you have a good sense on how the attributes will grow? For instance looking at the attributes you listed above, you may not know all of them, but they all do exist - so in theory you could fill the table. If you will have a lot of sparse data then #2 may work
When you do get new types of data can you group it into another table and use a foreign key? For instance if you want to capture the address you could always have an address table that references your initial table
What type of queries do you plan on using? It's much harder to query a key-value table than a "normal one" (not super hard, just harder - if you're comfortable using implied joins and the like to normalize the data then it's probably not a big deal).

Overall I'd be really careful before you implemented #2 - I've done it for certain specialized cases (metrics gathering where I have dozens of different metrics and don't really want to maintain dozens of different tables) but in general it's more trouble than it's worth.

For something like this I'd just create one table, and either add columns as you go along, or just create new tables for new data structures if necessary.