Should Query Tuning be Proactive or Reactive?

https://dba.stackexchange.com/questions/5385

16-10-2019
|

Question

As a software developer and an aspiring DBA, I try to encorporate best practices when I design my SQL Server databases (99% of the time my software sits on top of SQL Server). I make the best possible design prior to and during development.

But, just like any other software developer, there is added functionality, bugs, and just change of requirements that demands altered/created database objects.

My question is, should query tuning be proactive or reactive? In other words, a few weeks after some heavy code/database modification, should I just set aside a day to check out query performance and tune based off of that? Even if it seems to be running okay?

Or should I just be aware that less-than-average performance should be a database check and going back to the proverbial chalkboard?

Query tuning can take up a lot of time, and depending on the initial database design it could be minimal benefit. I'm curious as to the accepted modus operandi.

Solution

Both, but mostly proactive

It's important to test during development against realistic volumes and quality of data. It's unbelievably common to have a query running on a developers 100 or 1000 rows then fall flat with 10 million production rows.

It allows you to make notes too about "index may help here". Or "revisit me". Or "will fix with new feature xxx in next DB version".

However, a few queries won't stand the test of time. Data distribution changes or it goes exponential because the optimiser decides to use a different join type. In this case, you can only react.

Saying that, for SQL Server at least, the various "missing index" and "longest query" DMV queries can indicate problem areas before the phone call

Edit: to clarify...

Proactive doesn't mean tune every query now. It means tune what you need to (frequently run) to a reasonable response time. Mostly ignore the weekly 3am Sunday report queries.

OTHER TIPS

OK, I'll bite and take a contrarian view. First off, I would say you should never start by doing something that you know will lead you into trouble. If you'd like to call this applying best practices, go ahead. This is as far as being proactive should go.

After that, time's (and money's) a-wasting so get on with it and deliver your product. Instead of taking a ton of design time tuning queries that may or may never turn out to be bottle necks, use that time for extra testing, including load testing.

When you find out that something isn't performing up to your design specs, or if something is falling into the bottom 10% or 20% of your profiler's list of response times, then invest the time you need to tweak whatever it is that is broken.

In a perfect world everything would be designed perfectly from the start and developed using a logical build sequence. In the real world there are constraints on budget and time and your test data might not end up looking like your production data. For this reason, I say use common sense to avoid problems proactively but concentrate your limited resources tuning the things that turn out to be real problems instead of spending time and money you probably don't have on looking for imaginary or potential problems.

You're going to be doing 3 types of tuning, 1 reactive and 2 proactive.

Reactive

Out of the blue, some query starts causing you problems. It could be because of an application bug or feature, a table growing in excess of expectations, a traffic spike, or the query optimizer getting "creative". This could be a middle-of-the-night oh-crap-the-site's-down type of affair, or it could be in response to system slowness of a non-critical nature. Either way, the defining character of reactive tuning is that you already have a problem. Needless to say, you want to be doing as little of this as possible. Which brings us to...

Proactive

Type 1: Routine Maintenance

On some sort of schedule, every few months or weeks depending on how often your schema changes and how fast your data grows, you should review the output of your database's performance analysis tools (e.g. AWR reports for Oracle DBAs). You're look for incipient issues, that is things that on their way to requiring Reactive tuning, as well as low-hanging fruit, items that aren't likely to cause problems soon but can improved with little effort in the hopes of preventing far-future problems. How much time you should spend on this will depend on how much time you have, and what else you could be spending it on, but the optimal amount is never zero. However, you can easily reduce the amount you need to spend by doing more of...

Type 2: Proper Design

Knuth's admonition against "premature optimization" is widely known and duly respected. But the proper definition of "premature" must be used. Some application developers, when permitted to write their own queries, have a tendency to adopt the very first query they hit upon that is logically correct, and pay no mind whatsoever to performance, present or future. Or they may test against a development data set that simply isn't representative of the production environment (tip: Don't do this! Developers should always have access to realistic data for testing.). The point is that the proper time to tune a query is when it is first being deployed, not when it shows up on a list of poor-performing SQL, and definitely not when it causes a critical issue.

So what would qualify as a premature optimization in DBA land? At the top of my list would be sacrificing normalization without a demonstrated need. Sure you could maintain a sum on a parent row rather than calculating it at runtime from the child rows, but do you really need to? If you're Twitter or Amazon, strategic de-normalization and pre-calculation can be your best friends. If you're designing a little accounting database for 5 users, proper structure to facilitate data integrity needs to be top priority. Other premature optimizations are likewise a matter of priorities. Don't spend hours tweaking a query that gets run once a day and takes 10 seconds, even if you think you can cut it to 0.1 seconds. Maybe you have a report that runs for 6 hours daily, but explore scheduling it as a batch job before investing time in tuning it. Don't invest in a separate, real-time replicated reporting instance if your production load never floats above 10% (assuming you can manage the security).

By testing against realistic data, taking educated guesses at growth and traffic patterns (plus allowances for spikes), and applying your knowledge of your platform's optimizer quirks, you can deploy queries that run (close to) optimally not just now, but in the future, and under less-than-ideal conditions. When you apply the proper techniques, query performance can be accurately predicted, and optimized (in the sense of each component being as fast as it needs to be).

(And while you're at it, learn statistics!)

In a perfect world all tuning would be done in the design phase proactively and nothing would be reactive, but the world isn't perfect. You will find that test data sometimes isn't representative, test cases will have been missed, loads will be unexpectedly different, and there will be bugs that cause performance issues. These situations may require some reactive tuning, but that doesn't mean reactive tuning is preferred. The goal should always be to catch these up front.

Your planning for retroactive tuning is very pragmatic. When you are testing you should document expected timings and throughput and at times should actually build in analysis that lets you know when production processes are not meeting design specifications. In this way you may be able to identify in advance what code needs to be tuned. You can then determine not just what the problem is, but why you didn't catch it in the design/test phase.

For me, performance testing has always been part of the development process. Want to change this table, alter this report, add this feature? As part of testing you make sure that you can compare individual and overall performance to known baselines and/or against the requirements (e.g. some reports run in the background or are otherwise automated, so performance - or rather speed - of every single query in the system isn't always the top priority).

IMHO, this shouldn't be a reactive process at all - you should never wait until a change causes a performance problem in production to start reacting to it. When you make the change in dev/test etc., you should be testing those changes with similar data on similar hardware with the same apps and similar usage patterns. Don't let these changes get rushed out to production and surprise you. This will almost always happen when it isn't convenient to spend a day tuning - budget for that tuning time well in advance.

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange