Question

In MySQL, is it generally faster/more efficient/scalable to return 100 rows with 3 columns, or 1 row with 100 columns?

In other words, when storing many key => value pairs related to a record, is it better to store each key => value pair in a separate row with with the record_id as a key, or to have one row per record_id with a column for each key?

Also, assume also that keys will need to be added/removed fairly regularly, which I assume would affect the long term maintainability of the many column approach once the table gets sufficiently large.

Edit: to clarify, by "a regular basis" I mean the addition or removal of a key once a month or so.

Was it helpful?

Solution

You should never add or remove columns on a regular basis.

OTHER TIPS

http://en.wikipedia.org/wiki/Entity-Attribute-Value_model

There are a lot of bad things about this model and I would not use it if there was any other alternative. If you don't know the majority (except a few user customizable fields) of data columns you need for your application, then you need to spend more time in design and figure it out.

If your keys are preset (known at design time), then yes, you should put each key into a separate column.

If they are not known in design time, then you have to return your data as a list of key-value pairs which you should later parse outside the RDBMS.

If you are storing key/value pairs, you should have a table with two columns, one for the key (make this the PK for the table) and one for the value (probably don't need this indexed at all). Remember, "The key, the whole key, and nothing but the key."

In the multi-column approach, you will find that you table grows without bound because removing the column will nuke all the values and you won't want to do it. I speak from experience here having worked on a legacy system that had one table with almost 1000 columns, most of which were bit fields. Eventually, you stop being able to make the case to delete any of the columns because someone might be using it and the last time you did it, you had work till 2 am rolling back to backups.

First: determine how frequently your data needs to be accessed. If the data always needs to be retrieved in one shot and most of it used then consider storing all the key pairs as a serialized value or as an xml value. If you need to do any sort of complex analysis on that data and you need the value pairs then columns are ok but limit them to values that you know you will need to perform your queries on. It’s generally easier to design queries that use one column for one parameter than row. You will also find it easier to work with the returned values if they are all in one row than many.

Second: separate your most frequently accessed data and put it in its own table and the other data in another. 100 columns is a lot by the way so I recommend that you split your data into smaller chunks that will be more manageable.

Lastly: If you have data that may frequently change then you should use create the column (key) in one table and then use its numerical key value against which you would store the key value. This assumes that you will be using the same key more than once and should speed up your search when you go to do your lookup.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top