What typical hardware improvements can be made to improve a Web app's performance?

https://softwareengineering.stackexchange.com/questions/400617

03-03-2021
|

Question

We have a Single-Page (SPA) Ajax-based Java Spring/Hibernate app running in Tomcat 8.5. The app's performance is acceptable, but not lightning-fast. A typical Insert Record takes 3-4 sec., e.g.

05 Nov 2019 14:55:41,686 INFO  Insert Start
...
05 Nov 2019 14:55:45,766 INFO  Insert End

We have all the standard stuff, like DB indexes etc. They are working.

No one is complaining, but it's not an ultra-smooth web app like StackOverflow.com. StackOverflow is so fast that all operations take under 2 seconds.

What are some hardware improvements that can take a Web app to the next level?

DB: Increase memory & CPU on the DB server? But I've been told this won't help.
Increase JVM Heap Size? Already done: 2GB
Tomcat box hardware - memory & CPU?

One other thing I understand now is Hibernate (and similar ORMs) are a bad idea for performant apps. They're popular but the app would have more performance gains without them.

Solution

Before blaming Hibernate for performance issues, you should profile your application. By profiling a given request (if the whole application feels slow, just take any request), you'll get a more precise picture of what exactly is slow. Depending of what you discover, the solution for improving performance would be radically different. Some examples:

The request is performing four requests, and then the same fifth request three thousand times, because some developer misunderstood how lazy evaluation works. By fixing the code, the number of requests could be reduced to only five. (This example is actually from a real application I audited a few years ago.)
The request asks for too much from the database; essentially, when needing one entry, it just loads the whole table containing about five hundred thousand items, and then searches inside. (Unsurprisingly, the example comes from the same project as mentioned in the previous point.)
The request spends time using the CPU. If this is the case, the profiling would pinpoint the location where CPU time is used. From there, you can start optimizing your application.
The request uses the memory a lot. Check the data structures.
The request makes other requests to other services. Check why, if this is the case, other services take too long to respond.
The request waits the database a lot even when performing some very basic queries, while the database is hosted on a very capable hardware. Check the connection between the application server and the database server. (For instance, hosting the app server in USA and database server in India is not the greatest thing you can do in terms of performance.)
The request does a lot of disk writes. Check what exactly is logged: logging a lot and forgetting the log level at “verbose” in production is not a good idea, especially if the log is configured to flush every message to disk. (Happened once in a project when a fellow programmer wanted to debug a weird crash in production, and once the problem was fixed, forgot to reset the logging configuration.)

Those are only a few examples of what could be happening under the hood. Again, you have to figure out precisely what causes the slowness, before you try optimizing, especially if your only optimization idea is to switch to a different technology or ask for more expensive hardware. Once you find the bottleneck, you'll probably figure out how to fix it. It may be a simple change in code or configuration. It might be that you need more expensive hardware.

A few anecdotes from a few more projects:

Once upon a time, a team of programmers developed a web app. It was a bit slow, and the customer was concerned about its performance. Since one of the guys was reading a lot of articles about the benefits of parallel computing, the team decided to optimize the app this way. Then, they spent a few months fixing bugs which appeared when the code, which was never intended for multithreading, was run in parallel. Unfortunately, the performance didn't improve. So the guys, who were complaining for the past six months that they don't have the hardware they deserve, used this opportunity to obtain octa-core processors, replacing the old quad core ones, but the application seemed to become even slower.

Finally, a developer from another team figured it out. The problem was not the CPU, but the memory (especially because of a high number of unnecessary memory allocations). However, when the application was moved to octa-core CPUs, the application was also reconfigured, so it was spending more time creating threads and communicating between threads, making it indeed slower.
One of my projects was terribly slow, while the application wasn't even using that much the CPU or the memory: even a simple “Hello World” request would take about two seconds. After a while, I found that the problem was due to the fact that a bug in the application, mixed with a completely crazy configuration of IIS forced IIS to recycle the application pool after every request, making the application indeed quite clumsy. Fixing the bug and setting the configuration properly helped reducing the requests to a few dozens of milliseconds.

OTHER TIPS

You're probably not going to like this answer, but... Getting from four seconds to two seconds might require an architectural overhaul. Essentially, this is probably as fast as your Java Spring/Hibernate app running in Tomcat 8.5 is likely to get.

Example:

ASP.NET Web Forms application running on IIS and SQL Server: 4 to 7 seconds per page.

Same app written in ASP.NET MVC running on IIS and SQL Server (like what Stack Overflow does): 1 to 3 seconds per page.

The first step in debugging performance issues is understanding the bottlenecks in your architecture. In a monolithic application (i.e. traditional n-tier development), you have a few main potential bottlenecks:

CPU Load
Memory utilization
Network bandwidth
Disk speed

If your application is taking 4 seconds to respond to a request, look at these three things to find what is maxed out. Each item can have different causes:

CPU Maxed out:
- Look for inefficient algorithms by looking for hotspots in a profiler
- Increase your CPU to a faster model (pretty limited with this option today)
Memory maxed out:
- Make sure you are not swapping to disk, that is very slow
- Increase memory either to the JVM or to your server
- Use a memory profiler to find memory leaks (i.e. memory that is not reclaimed after garbage collection that should be.
Network maxed out:
- Add another network card?
- If in a VM environment, collocate your database VM with your app VM to take advantage of the high speed inter-VM networking
Disk maxed out:
- Make sure you have indexes on your database to prevent full table scans
- Minimize disk use
- Get faster disks (SSDs might be worth the investment, or use RAM disks for temporary files)

If none of these are maxed out and you still have performance problems, there is a good chance you might be suffering from resource locking. If one person is updating a table while another person is querying it, there is a chance that the record(s) being updated will cause the other person to wait. If you can deal with dirty reads, you can reduce the locking overhead in your database.

Bigger Guns

You will inevitably hit a ceiling of what you can do with bigger hardware in a monolithic environment. At that point you really need to think about scaling out. Stack Overflow does a great deal to allow the system to scale out and remain performant. You can go a full microservice route, or just host your monolithic application on multiple servers.

The main thing you must strive for to enable scaling out is to completely avoid server side sessions. In a "shared nothing" environment, there is no reason to have server sessions. The information that would have gone into session variables either goes in the database, or they are stored in the front-end code in the browser.

You'll want to start working with clusters. The database with multiple cluster nodes can spread the work across each node to smooth out the load. The set of application servers can simply host additional copies of the app, and user a load balancer in front. Without any need for session affinity, you can use simple round-robin balancing which is fast.

Next you'll need to look into caching servers like Redis or some equivalent. If your resources take time to put together, but don't change often, this is the final piece of the puzzle to make the response times very fast.

If you think of any quick responding site on the network, I guarantee that they have invested heavily in scaling out rather than scaling up. The degree that they've done so is very different from each environment. For example, Stack Exchange has been able to do a lot with a hybrid monolithic architecture at it's core, but they are hosting on 9 different web servers (reference).

The bottom line is that it costs a lot of effort, and your daily hosting costs increase, when you invest in the type of changes needed to scale out.

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange