DB design and performance: Is it OK to use redundant FK to increase performance?

https://stackoverflow.com/questions/12004210

26-06-2021
|

Question

Lets say I have following DB structure:
Normalized DB structure
and my application needs to show the list of articles with all the details (model, product family, brand, producer). For that I would need to make more JOINs to get needed data.

Is it OK if I increase application's performance by creating redundant FKs to the Article table like follow? Does it actually increase the performance?
DB structure with redundant FKs

Solution

Yes you can increase performance that way if you don't want to retrieve any data of "intermediate" objects in the hierarchy. This is a common form of denormalization. Notice, that you need to be careful not to let inconsistencies slip in.

I usually set up a nightly task that verifies the denormalized data, mails errors to me and automatically fixes them. This is not hard to do and eliminates a nasty class of bugs.

A different reason why people do is is to partition all tables on the same key.

OTHER TIPS

The best way to find out if a design improves performance is to try it; the second best way is to think through the queries you're likely to need to run, and then try to model them in your head. Without knowing what queries you want to run, or how large your database is, it's hard to know whether you will see a performance improvement.

In very general terms, I'd say you won't see a measurable impact on performance unless you have a very large database (assuming you're running this on decent hardware, and that you've tuned your indexes). By "very large", I'm thinking millions of rows in several tables.

If you really do need to de-normalize, my advice is to create an explicitly denormalized table, rather than "pollute" your regular design with redundant keys. It's a lot easier to understand a design that's factored into separate "how it should be" and "compromises", rather than mix the two together.

To achieve that, I'd create a separate table - "cached_articles" perhaps, with columns:

article_id
...(article data)
model_id
....(model data)
family_id
...(family data)
brand_id
....(brand data)
producer_id
....(producer data)

You can maintain this table through batch jobs or triggers. You should only ever have your application code write to the normalized tables, and only read from the cache tables when you need to.

You should also build a robust "consistency check" mechanism to identify data issues which could cause the application to break; those consistency checks become a big deal once your database grows to the size where this kind of design becomes necessary, because they run into the same performance problems...

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow