Question

I'm developing software for conducting online surveys. When a lot of users are filling in a survey simultaneously, I'm experiencing trouble handling the high database write load. My current table (MySQL, InnoDB) for storing survey data has the following columns: dataID, userID, item_1 .. item_n. The item_* columns have different data types corresponding to the type of data acquired with the specific items. Most item columns are TINYINT(1), but there are also some TEXT item columns. Large surveys can have more than a hundred items, leading to a table with more than a hundred columns. The users answers around 20 items in one http post and the corresponding row has to be updated accordingly. The user may skip a lot of items, leading to a lot of NULL values in the row.

I'm considering the following solution to my write load problem. Instead of having a single table with many columns, I set up several tables corresponding to the used data types, e.g.: data_tinyint_1, data_smallint_6, data_text. Each of these tables would have only the following columns: userID, itemID, value (the value column has the data type corresponding to its table). For one http post with e.g. 20 items, I then might have to create 19 rows in data_tinyint_1 and one row in data_text (instead of updating one large row with many columns). However, for every item, I need to determine its data type (via two table joins) so I know in which table to create the new row. My zend framework based application code will get more complicated with this approach.

My questions:

  1. Will my solution be better for heavy write load?
  2. Do you have a better solution?
Was it helpful?

Solution

Since you're getting to a point of abstracting this schema to mimic actual datatypes, it might stand to reason that you should simply create new table sets per-survey instead. Benefit will be that the locking will lessen and you could isolate heavy loads to outside machines, if the load becomes unbearable.

The single-survey database structure then can more accurately reflect your real world conditions and data input handlers. It ought to make your abstraction headaches go away.

There's nothing wrong with creating tables on the fly. In some configurations, soft sharding is preferable.

OTHER TIPS

This looks like obvious solution would be to use document database for fast writes and then bulk-insert answers to MySQL asynchronously using cron or something like that. You can create view in the document database for quick statistics, but allow filtering and other complicated stuff only in MySQ if you're not a fan of document DBMSs.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top