Question

I have many tables that use Lookup/Enum references for most of their column values. For example:
Person Table - PersonID | RaceCode | HairColorCode | HairStyleCode | TeethConditionCode
Location Table - LocationID | SizeCode | ExteriorColorCode | ConditionCode
Things like Race,Size,Color,Condition,etc would just be foreign key references to a Code lookup table. This code table has other fields but aren't important for my question. The database is for a SaaS application which means each client could have their own list of of Colors, Races, Conditions, etc. There are some codes that would be static which clients couldnt change.

Is it better to have 1 code table or 2 types of code tables (DynamicCodeTable for customer defined ones and StaticCodeTable for those that one change) or should I have a table for each code type (RaceCodeTable, HairColorTable, Condition, etc) ?

The thing I am the most worried about is all the sql joins. The Person table I am working with has 20+ of these code attributes. Is there a difference in performance when joining to 20 different tables VS joining to the same table 20 times? Having multiple tables means each table would be smaller and the lookup 'should' take less time. But having a single table could be quick too. Any suggestions?

Was it helpful?

Solution

Without knowing more about the application or requirements I would recommend having one table for each code type. IMO the database design would be more clear and self documenting to have foreign keys for each type of code you have.

OTHER TIPS

This topic has been discussed at length over the last fifteen years, under the subject "One True Lookup Table" (abbreviated OTLT). The advantages of such an approach leap out to the database newbie. The drawbacks emerge over time. See these links for OTLT drawbacks:

Or search for OTLT to find more discussions.

If you create many lookup tables, and many maintenance screens for them, you can create a view that simulates the OTLT by creating a gigantic UNION that includes every code, every description, and the name of the table where the code-description pair is stored. It's possible to generate such a union using semiautomatic methods, if you know what you're doing. I would imagine that semiautomatic methods would enable you to build a single maintenance screen for hundreds of lookup tables, and then put some logic between that screen and the tables that would insert a new code in the correct table.

As to letting the users introduce new code TYPES, and not just new code VALUES, that opens a whole big can of worms. See the above article discussing EAV. This is very seductive, because it allows the users to design their own underlying data structure. If you disregard performance, this works pretty well for a while. You get a perfectly general database without having to learn the data structure from the users or the subject matter experts.

When it runs into real grief is when you try to use the data as if it were an integrated database, and not just a hodge podge of disjointed opinions about the data. At this point, you're into some serious data archaeology, when your customers expect routine report generation. Good luck.

(Editted to change "data mining" to "data archaeology")

I made a mistake of thinking all these lookup tables would be a great idea when redesigning our pretty wide tables. So much flexibility, etc but it ended up being much harder to code for, it was impossible to navigate around, and it was just a pain in the ass.

So what did I learn?

  • for static values, just use an enum - it's a lot faster and more convenient. This decision has to be made depending on how many other tables may refer to the same variable.
  • stick with fewer lookup tables rather than creating as many as you can think of. JOINs are much slower.
  • to help yourself navigate around, design database VIEWs. It will make your life a lot easier.
  • as a bonus, if you don't want your clients touching certain tables (i.e. your static ones), or touching enum column values, you can use the MySQL (for example) fine-grained permissions to disable changes to certain columns in certain tables. A lot of people don't realize how flexible these permissions can get.

There's a potential performance difference.

A table with just 2 rows ties up a lot of space in cache for those two tiny rows.

If you have a lot of lookup values in a single table, you -- effectively -- pack those values more densely into cache.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top