How to keep track of performance testing

https://stackoverflow.com/questions/1400666

05-07-2019
|

Question

I'm currently doing performance and load testing of a complex many-tier system investigating the effect of different changes, but I'm having problems keeping track of everything:

There are many copies of different assemblies
- Orignally released assemblies
- Officially released hotfixes
- Assemblies that I've built containing further additional fixes
- Assemblies that I've build containing additional diagnostic logging or tracing
There are many database patches, some of the above assemblies depend on certain database patches being applied
Many different logging levels exist, in different tiers (Application logging, Application performance statistics, SQL server profiling)
There are many different scenarios, sometimes it is useful to test only 1 scenario, other times I need to test combinations of different scenarios.
Load may be split across multiple machines or only a single machine
The data present in the database can change, for example some tests might be done with generated data, and then later with data taken from a live system.
There is a massive amount of potential performance data to be collected after each test, for example:
- Many different types of application specific logging
- SQL Profiler traces
- Event logs
- DMVs
- Perfmon counters
The database(s) are several Gb in size so where I would have used backups to revert to a previous state I tend to apply changes to whatever database is present after the last test, causing me to quickly loose track of things.

I collect as much information as I can about each test I do (the scenario tested, which patches are applied what data is in the database), but I still find myself having to repeat tests because of inconsistent results. For example I just did a test which I believed to be an exact duplicate of a test I ran a few months ago, however with updated data in the database. I know for a fact that the new data should cause a performance degregation, however the results show the opposite!

At the same time I find myself sepdning disproportionate amounts of time recording these all these details.

One thing I considered was using scripting to automate the collection of performance data etc..., but I wasnt sure this was such a good idea - not only is it time spent developing scripts instead of testing, but bugs in my scripts could cause me to loose track of things even quicker.

I'm after some advice / hints on how better to manage the test environment, in particular how to strike a balance between collecting everything and actually getting some testing done at the risk of missing something important?

Solution

Scripting the collection of the test parameters + environment is a very good idea to check out. If you're testing across several days, and the scripting takes a day, it's time well spent. If after a day you see it won't finish soon, reevaluate and possibly stop pursuing this direction.

But you owe it to yourself to try it.

OTHER TIPS

I would tend to agree with @orip, scripting at least part of your workload is likely to save you time. You might consider taking a moment to ask what tasks are the most time consuming in terms of your labor and how amenable are they to automation? Scripts are especially good at collecting and summarizing data - much better then people, typically. If the performance data requires a lot of interpretation on your part, you may have problems.

An advantage to scripting some of these tasks is that you can then check them in along side the source / patches / branches and you may find you benefit from organizational structure of your systems complexity rather than struggling to chase it as you do now.

If you can get away with testing only against a few set configurations that will keep the admin simple. It may also make it easier to put one on each of several virtual machines which can be quickly redeployed to give clean baselines.

If you genuinely need the complexity you describe I'd recommend building a simple database to allow you to query the multivariate results you have. Having a column for each of the important factors will a allow you to query in for questions like "what testing config had the lowest variance in latency?" and "which test database allowed the raising of most bugs?". I use sqlite3 (probably through the Python wrapper or the Firefox plug-in) for this kind of lightweight collection, because it keeps maintenance overhead relatively low and allows you to avoid perturbing the system under test too far, even if you need to run on the same box.

Scripting the tests will make them quicker to execute and permit results to be gathered in an already-ordered way, but it sounds like your system may be too complex to make this easy to do.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow