stress testing web applications on less capable hardware

https://stackoverflow.com/questions/13015616

13-07-2021
|

Question

My organization is having an interesting internal debate right now that raises a question that I would like to open to the community at large.

The issue at hand is our environment in which we do stress-testing, capacity-testing, performance-regression-testing, and the like.

On one side of the debate are some software engineers who would like this environment to mirror the production environment as much as possible, in the interest of making the results as meaningful as possible. While we currently do have an environment for such testing, it is far less capable than the production system, and these software engineers feel that they are reaching the limits of what they can learn from it.

On the other side of the debate are some network engineers who both administer the environments and control the purse-strings. While they concede that capacity-testing would be better in an environment that is a better replica of the production environment, they argue that – for the purposes of stress testing – a more modest environment would have the effect of magnifying performance bottlenecks, making them easier to discover and fix.

This finally brings us to the part that piqued my interest: one software engineer suggests that while a more modest stress environment will increase the likelihood that you will encounter some bottleneck, it does not necessarily follow that it would help you find the next bottleneck you may encounter in production. The scaling effect, he argues, may not be linear.

Is there merit to that point of view? If yes, then why? What are the sources of that nonlinearity?

There are a lot of moving parts involved here: a cluster of java application servers, a cluster of database servers, lots of dynamic content being generated for each HTTP hit.

Edit: I appreciate everybody's thoughts so far, but I was really hoping that someone would do more than re-affirm one side or the other and actually tackle the question of "why". If there is such nonlinearity, what gives rise to it? Better yet it would be great if the reasons were expressed in terms of the CPU, memory, bandwidth, latency, interactions between subsystems, what have you... TerryE, you have come the closest. You should re-post your comment as an answer for the bounty if no one else steps up

Solution

The assumption of the network engineers is that modest system is basically a scale model of the production system. They are also assuming that the various characteristics of the production environment which would be affecting the software performance are mirrored in the more modest system just at lower levels however in the same ratios. For instance, the CPU is not as fast, there is not quite as much memory, the storage is a bit slower, etc. and all of these differences are in similar ratios such that if everything were magically multiplied by some factor, say 1.77, the resulting changed modest system would be exactly like the production system.

However that the modest system is an exact scale model in all particulars of the production system is difficult for me to believe.

Here is a specific example. Lets say that measurements on the production system indicates that CPU utilization, the percentage of time the CPU is not idle, is too high. So you put the software on the modest system and do measurements and discover that on the modest system, the CPU utilization is lower. An investigation reveals that the modest system has slower storage so the CPU is spending more time idle waiting on data transfer from storage to complete because the application is I/O bound on the modest system where as on the production system it is not. This difference is due to the modest machine not being an exact scale model of the production machine because the CPU ratio is different than the I/O transfer ratio.

Another example would be having more memory allowing fewer page faults in the production environment. When the software is loaded onto the more modest machine, there are more page faults due to having less physical memory. With the various applications paging in and out, they begin to affect each other as pages of other applications are swapped out and then swapped back in again. On the product machine with larger memory, this cascading page fault behavior is not seen because there is sufficient memory to hold more applications simultaneously.

The point that I am really trying to make here is that a computer with all its various parts and applications is a complex, dynamic system. The idea that one computing environment is just a scale model of another is too simplistic of an assumption. Using a modest system can certainly provide valuable data. However once the gross adjustments have been made to the software and you are beginning to get into more subtle detailed adjustments, the differences in the environment can have a large impact on the results of the testing.

Some citations. Computer systems are dynamical systems by Tod Mytkowicz, Amer Diwan, and Elizabeth Bradley.

Bayesian fault detection and diagnosis in dynamic systems by Uri Lerner, Ronald Parr, Daphne Koller, and Gautam Biswas.

OTHER TIPS

Your software developer is right and I will take the point even further.

When you test an application components, like a web service, to see its behaviour under load, it is understandable to use a less capable environment. You can find the bottlenecks about memory, io etc. And most probably will find bugs and oversights like out of memory errors and log files getting huge.

But when your application components are running as intended and you need to test the whole shebang, you need to test the real environment.

When you run stress tests on an environment, you measure that environment's behaviour under load and its bottlenecks. While this tests may provide valuable information, this information will not be about your production system. The bottlenecks you find might not be relevant to your real system and you may spend precious development time to fix the bugs that do not exist. To know about bottlenecks you really might face with, you should run your stress tests on your real production system (preferably before the grand opening).

I have encountered the similar situation in my production environment. We use modest system just for initial and basic level testing and findings. It is true that you can never find real bottlenecks and other performance issues on your testing environment. So to find real performance related issues and to find bottlenecks you must do it on production environment, there's no other way.

We have hosted over 2.5Million websites, although it might not be case with yours but let me tell you this, that in our case, we have faced horrible situations of linear bottlenecks. Meaning, we first faced memory issues when our traffic was getting increased. We resolved that by adding more memory. Until then we didn't even notice that having only 256 threads of httpd was our next bottleneck because limited memory was hiding it, once we resolved memory issue it quickly came down to the problem that why our webservers were slow again after just few weeks? We found out that 256 httpd threads are just not enough to serve that much traffic. We not only increased threads but also installed HA parallel load balancers in front of our WebServers to mitigate the issue.

Fortunately it solved our slow page loading problems. But after few months as traffic continuously grew we got into next bottleneck of storage system. You know what this time disk I/Os was the issue. To make the story short, we put parallel NFS based physical storage systems. Each NFS machine now serve files by having over 2000 threads running.

I forgot to mentioned that Database was also a big culprit of slowness that we resolved that issue by installing Master-Slaves model in cluster. We had to do a lot of performance tweaks in our application code as well and we had to physically distribute our application into different modules over different servers.

I'm just mentioning all this to prove a point that it's very likely that all performance related issues almost come in a linear way, at least that's what we have faced in our WebBased model. Even you have tested a lot on your modest systems you still have chances hidden bottlenecks which you can't find on testing environments.

What I have learned in my last 6 years experience that try to DISTRIBUTE your model AS MUCH AS POSSIBLE if you think you might going to have a lot of traffic or hits/sec. Centralized model can hold your traffic for some time by doing much tweaks but in the end your system gets busted.

I'm not saying you will face some bottlenecks or issues in your situation but I just wanted to warn you that these cases happen sometimes, just so you better aware.

**Sorry for my English.

good question. learning and optimization is best on modest hardware. but testing is safer on mirror (or at least something from same epoch)

it seams like you try to predict the first bottleneck that will appears and when it will happen. i'm not sure if that's the correct objective and the correct way. i assume we don't speak about a typical CRUD where client says 'it should work as fast as every other web application'. if you want to do tests correctly then, before you start your tests, you should know the expected load. expected number of users, expected number of events, response time etc. it's a part of your product specification. if you don't have the numbers, that means your analysts didn't do their job.

if you have the numbers then you don't need exact tests result. you just need to know the order of magnitude. you should also check how your software/hardware scale. how many instances do you need to handle x users/requests/whatever and how many to handle y

We load test systems for our customers every day -- and we see a wide range of problems. Certain classes of problems can be found on down-sized systems. Other cannot. Some can ONLY be found in production...because no matter how closely you mirror the two systems, they can never be identical. You can get REALLY close, if you work hard enough.

So, simple fact of testing: the closer your system is to the production system, the more accurate your tests will be.

IMO, this is one of the best reasons for moving to the cloud: you can spin up a system that is very close to your production system (about as identical as you could ever get) and run your load tests on that.

It is probably worth mentioning that we've occasionally seen customers waste a lot of hours chasing problems in their test environments that never would have occurred in production. The more different the environments are, the more likely this is to happen :(

I think you have partially answered your own question - you already have a production level environment and are already finding it is not at the same level / not as capable as the production environment. The bottom line is that with all the money in the world you will never be able to replicate the exact functioning of the production website - timings of events, volumes, cpu utilisation, memory utilisation, db IO, when it's all working in anger the behaviour can be non-deterministic to a certain extend. My point is you can never make it exactly the same. And on the other side of the coin a production environment by it's nature is going to be an expensive environment with a lot of kit in order to make it perform and handle your production volume of data / transactions. This is a big expense / overhead to the business, and in these times of frugality should we not be looking to avoid additional cost to the business.

Maybe a different tactic should be taken - learn the performance profile of your production software - how it scales with volume, does running times increase linearly, exponentially or logarithmically? Can you model this? Firstly you can verify that the test environment is behaving in a similar way - this is key to having a valid test. Then the other important part is taking relative tests rather than absolutes - you aren't going to get absolute running times that are the same as production, but run your performance tests before deploying the code changes to give you your baseline, then deploy your code changes and re-run the performance tests - this will give you the relative changes in production (e.g. will the performance degrade with this code release), based on your models of performance you will be able to verify that the software is scaling in the same way with extra volume.

So my viewpoint is that there is a great deal you can learn about your software and hardware performance in the lower environment, and doing this on a smaller / less capable infrastructure saves your company money, and if used right can give you most of your answers to performance testing that you are looking for.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow