Can a wide table be the correct choice in design?

https://softwareengineering.stackexchange.com/questions/398663

02-03-2021
|

Question

I am relatively new to database architecture and I have been tasked with designing a database that will store 250 values per entry by my group. This is pushing the upper limit for fields in an Access recordset, so my first instinct is to break up the data into more manageable tables after reading answers like these. No entry will be changed after submission as this database is intended for analysis and data reuse in other applications. While I do not think there will be additional fields required, I feel it also would be best to allow room for the possibility.

I understand that any application built to query this database would execute marginally faster if there is one table, is there any reason other than this that a single table would be preferable?

Solution

Indeed, you're close to the limits. So close that you might very soon face it for good. And in this case, refactoring all your code to work with several tables would be time consuming, whereas anticipating this situation now would only be a very small overhead. So better think twice.

In your situation, you could just create 2 tables, using a one to one relation. The split of columns/fields could be arbitrary. But you could have a closer look at the columns:

Maybe you can discover some groups of columns that are at least logically related ?
May be you can discover some groups of columns that are often either filled or empty at once. This would be a typical symptom of a single table inheritance.
Maybe even, the one table is an artificial design simplification that mentally joins several independent but related tables? In my own experience, most of the very large tables fall into this category
Maybe you find in the name of the columns some naming patterns (e.g. planned_start, planned_end, real_start, real_end) that could suggest a non-obvious separate table (table key_dates: id, plan_or_real, start, ende)

The advantage of one table are:

simplicity of reports (always based on a select on the single table)
simplicity of queries (no joins)
performance of the queries (but the overhead of a join based on indexed columns is marginal)
simplicity of data acquisition, if the many columns are provided by an existing file format
no analysis risk (if some supposed relations between columns prove to be wrong, there will be no issue with the single table).
no consistency risk (existence of oter records, validity of ides accross tables do not need to be verified).
you don't need to manage the id: you can use an automatic id easily (with two tables, you'd need to have one table with auto id, but you'd need some code to copy the auto id in he second table; but this is only possible after the first record was inserted).

But despite all these advantages, and unless I would have to provide a solution without code, I would nevertheless go for a solution with several tables instead of starting with so little margins to add more.

OTHER TIPS

250 fields per record? Has there ever been such a case that was legitimate?

Even in the highly improbable case where the data was suited to being modelled and handled as a single base table with 250 columns, the simple need of the programmer to have some organisation of the columns would lead him to split the data across multiple tables with a one-to-one relation.

It is also highly improbable that data entry would occur 250 fields at-a-time, or that any query of stored data would usually require all 250 fields to be returned at once to a user.

Most of us would feel things were getting out of hand whenever records with more than thirty or forty fields are being handled at once by any person, whether the programmer or the user.

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange