Question

We're drawing up the database structure with the help of mySQL Workbench for a new app and the number of joins required to make a listing of the data is increasing drastically as the many-to-many relationships increases.

The application will be quite read-heavy and have a couple of hundred thousand rows per table.

The questions:

  • Is it really that bad to merge tables where needed and thereby reducing joins?

  • Should we start looking at horizontal partitioning? (in conjunction with merging tables)

  • Is there a better way then pivot tables to take care of many-to-many relationships?

  • We discussed about instead storing all data in serialized text columns and having the application make the sorting instead of the database, but this seems like a very bad idea, even though that the database will be heavily cached. What do you think?

Was it helpful?

Solution

Go with the normalized form of the database. For most part of the tasks you won't need more than 3 or 4 Joins and you still can write views for the most common joins. Denormalization will have you to always think of updating fields in multiple places/tables when changing one property and will surely lead to more problems than benefits.

If you worry about reporting performance then you still can extract the data in timed batches into separate tables to get the desired performance for your reporting queries. If it's for query simplicity you can use views.

OTHER TIPS

In inverse order:

  • Forget it. Use the database. People saynig "make it in the application" are pretty often those ignorant to the amount of work going into writing databases.

  • Depends on exact need.

  • Depends on exact need. OLTP (Transaction processing) - go for for firth normal form. OLAP (Analytical processing) - go for a proper star diagram and denormalize to get optimal performance. Mixed - forget it. Does not work for larger installs because the theories are different... except if you make the database OLTP and then use a special OLAP cube database (which mySQL does not have).

Databases are designed to handle lots of joins. Use this feature as it will make many kinds of data manipulation in the database much easier. Otherwise, why not just use a flat file?

As always, it depends on your application, but in general, too much denormalisation can come back and bite you later on. A well normalised database means that you should be able to query your data in most ways that you may need later on, particularly for reporting (which often is an afterthought).

If you stick all your data in serialized text columns and your client asks for a report showing all rows that have a particular attribute, then you're going to have to do a bunch of string manipulation to get this data out.

If you're worried about too many joins for your queries, you could consider exposing certain sets of the data as a view...

If you make sure to index the foreign keys (you did set up foreign keys didn't you?) and have proper where clauses in your queries, 10-15 joins should be easily handled by a database. Especially with so few rows. I have queries with that many joins on tables with millions of rows and they run fine.

Usually it is better to partition data than to denormalize.

As far as denomalizing goes, don't do it unless you also institute a strategy for keeping the denormalized data in synch with the parent table.

As to whether you really need that many tables or if your design is bad, well the only way we could comment on that is if we saw the table structure.

Unless you have clear evidence that performance is suffering because of the joins, stay normalised. Otherwise, as others have said, you'll have to worry about multiple updates.

Especially if the database is heavily cached, as you say, you'll be surprised how quick the DBMS is at doing this kind of thing - it is what it's designed for, after all.

Unless it's the sort of monster application, with huge amounts of data, that demands special performance optimisations, you'll find that keeping down the development, testing, and later, maintenance effort, will be much more important.

Joins are good, usually, not bad. They allow you to keep the data where it should be, which gives you maximum flexibility.

And as has been said many times, premature optimisation is usually bad, not good.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top