Question

Say that I have two tables like those:

Employers (id, name, .... , deptId).
Depts(id, deptName, ...).

But Those data is not going to be modified so often and I want that a query like this

SELECT name, deptName FROM Employers, Depts 
    WHERE deptId = Depts.id AND Employers.id="ID"

be as faster as it can.

To my head comes two possible solutions:

  • Denormalize the table:

    Despite that with this solution I will lose some of the great advantages of have "normalized databases, but here the performance is a MUST.

  • Create a View for that Denormalize data.

    I will keep the Data Normalized and (here is my question), the performance of a query over that view will be faster that without that view.

Or another way to ask the same question, the View is "Interpreted" every time that you make a query over it, or how works the views Stuff in a DBA?.

Was it helpful?

Solution

Generally, unless you "materialize" a view, which is an option in some software like MS SQL Server, the view is just translated into queries against the base tables, and is therefore no faster or slower than the original (minus the minuscule amount of time it takes to translate the query, which is nothing compared to actually executing the query).

How do you know you've got performance problems? Are you profiling it under load? Have you verified that the performance bottleneck is these two tables? Generally, until you've got hard data, don't assume you know where performance problems come from, and don't spend any time optimizing until you know you're optimizing the right thing - 80% of the performance issues come from 20% of the code.

OTHER TIPS

If Depts.ID is the primary key of that table, and you index the Employers.DeptID field, then this query should remain very fast even over millions of records.

Denormalizing doesn't make sense to me in that scenario.

Generally speaking, performance of a view will be almost exactly the same as performance when running the query itself. The advantage of a view is simply to abstract that query away, so you don't have to think about it.

You could use a Materialized View (or "snapshot" as some say), but then your data is only going to be as recent as your last refresh.

In a comment to one of the replies, the author of the question explains that he is looking for a way to create a materialized view in MySQL.

MySQL does not wrap the concept of the materialized view in a nice package for you like other DBMSes, but it does have all the tools you need to create one.

What you need to do is this:

  1. Create the initial materialization of the result of your query.
  2. Create a trigger on insert into the employers table that inserts into the materialized table all rows that match the newly inserted employer.
  3. Create a trigger on delete in the employers table that deletes the corresponding rows from the materialized table.
  4. Create a trigger on update in the employers table that updates the corresponding rows in the materialized table.
  5. Same for the departments table.

This may work ok if your underlying tables are not frequently updated; but you need to be aware of the added cost of create/update/delete operations once you do this. Also you'll want to make sure some DBA who doesn't know about your trickery doesn't go migrating the database without migrating the triggers, when time comes. So document it well.

Sounds like premature optimisation unless you know it is a clear and present problem.

MySQL does not materialise views, they are no faster than queries against the base tables. Moreover, in some cases they are slower as they get optimised less well.

But views also "hide" stuff from developers maintaining the code in the future to make them imagine that the query is less complex than it actually is.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top