MySQL: Many joins and relations on the same table (theoretical question)

https://stackoverflow.com/questions/4344879

30-09-2019
|

Question

This is a more theoretical question, not a specific scenario:

Let's assume, we have a simplified table scheme like this:

alt text

items contains some basic data, item_data additional properties for each item an rel_items sets a tree relationship between the different items. There are different types of items (represented by the field items.item_type) which have different fields stored in item_data, for example: dog, cat, mouse.

If we have some bigger queries with some joins and conjunctions (stuff like getting items with their parent items having some conditions with other items and so on), could this become a performance issue compared to splitting all different types of items into separate tables (dog, cat, mouse) and not merging them into a single one?

If we keep it all in one basic item table, does creating views (dog, cat, mouse) impact performance somehow?

edit (as commented below): I thought of "species", "house-pets" and so on as item_types. Each type has different properties. The intention of using a basic item table and the item_data table is to have a basic "object" and attaching as many properties to them as necessary without having to modify the database scheme. For example, I don't know how many animals there will be in the application and what properties they have, so I thought of a database scheme that doesn't need to be alterted each time the user creates a new animal.

Solution

If we have some bigger queries with some joins ..., could this become a performance issue compared to splitting all different types of items into separate tables (dog, cat, mouse) and not merging them into a single one?

No.

If we keep it all in one basic item table, does creating views (dog, cat, mouse) impact performance somehow?

No.

Separate tables means they're fundamentally different things -- different attributes or different operations (or both are different)

Same table means they're fundamentally the same things -- same attributes and same operations.

Performance is not the first consideration.

Meaning is the first consideration.

After you sort out what these things mean, and what the real functional dependencies among the items are, then you can consider join performance.

"Dog, cat, mouse" are all mammals. One table.

"Dog, cat, mouse" are two carnivores and one omnivore. Two tables.

"Dog, cat, mouse" are two conventional house-pets and one conventional pest. Two tables.

"Dog, cat, mouse" are one cool animal and two nasty animals. Two tables.

"Dog, cat, mouse" are three separate species. Three tables.

It's about the meaning.

OTHER TIPS

The attempt to build a schema that can acommodate new objects, ones not analyzed and included when the database was designed, is an idea that comes up over and over again in discussions of relational databases.

In classical relational data modeling, relations can be devised in the light of certain propositions that are to be asserted about the universe of discussion. These propositions are the facts that users of the data can obtain by retrieving data from the database. Base relations are asserted by storing something in the database. Derived relations can be obtained by operations on the base relations. When an SQL database is built using a relational data model as a guide, base relations become tables and derived relations become views.

But all of this presupposes that the attributes are discovered during data analysis, before database design begins.

In practice, over the last 25 years, most databases have been built on the basis of analysis later revealed to have been incomplete or incorrect. Databases then get revised in the light of new and improved analysis, and the revised database sometimes requires application code maintenance. To be sure, the relational model and the SQL databases created fewer application dependencies than the pre-relational databases did.

But it's natural to try to come up with a generic data schema like yours, that can accomodate any subject matter whatsoever with no schema changes. There are consequences to this approach, and they involve far greater costs than mere performance issues. For small projects, these costs are quite manageable, and the completely generic schema may work well in those cases.

But in the very big cases, where there are dozens of entity types and hundreds of relevant propositions based on those entities and their relationships, the attempt to build a schema that is "subject matter agnostic" has often resulted in disaster. These development disasters are well documented, and the larger disasters involve millions of dollars of wasted effort.

I can't prove to you that such an approach has to lead to disaster. But learning from other people's mistakes is often much more worthwhile than taking the risk of repeating them.

Surely, accessing data in joined table WILL be slower, always. But with proper indexes it might be acceptable slowdown (like 2x).

I would move common items you use in queries into items table, and leave in item_data only values you need to display, which are not uses in WhERE and JOIN conditions.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow