How do microservice system architectures avoid network bottlenecks?

https://softwareengineering.stackexchange.com/questions/275734

07-10-2020
|

Question

I've been reading a lot about microservice architectures for server applications, and have been wondering how the internal network usage is not a bottleneck or a significant disadvantage compared to a monolith architecture.

For the sake of precision, here are my interpretations of the two terms:

Monolith architecture: One application in a single language that handles all of the functionality, data, etc. A load balancer distributes requests from the end user across multiple machines, each running one instance of our application.
Microservice architecture: Many applications (microservices) handling a small portion of the functionality and data. Each microservice exposes a common API that is accessed through the network (as opposed to inter-process communication or shared memory on the same machine). API calls are stiched mostly on the server to produce a page, although perhaps some of this work is done by the client querying individual microservices.

To my naive imagination, it seems like a microservices architecture uses slow network traffic as opposed to faster resources on the same machine (the memory and the disk). How does one ensure that API querying through the internal network will not slow down the overall response time?

Solution

Internal networks often use 1 Gbps connections, or faster. Optical fiber connections or bonding allow much higher bandwidths between the servers. Now imagine the average size of a JSON response from an API. How much of such responses can be transmitted over a 1 Gbps connection in one second?

Let's actually do the math. 1 Gbps is 131 072 KB per second. If an average JSON response is 5 KB (which is quite a lot!), you can send 26 214 responses per second through the wire with just with one pair of machines. Not so bad, isn't it?

This is why network connection is usually not the bottleneck.

Another aspect of microservices is that you can scale easily. Imagine two servers, one hosting the API, another one consuming it. If ever the connection becomes the bottleneck, just add two other servers and you can double the performance.

This is when our earlier 26 214 responses per second becomes too small for the scale of the app. You add other nine pairs, and you are now able to serve 262 140 responses.

But let's get back to our pair of servers and do some comparisons.

If an average non-cached query to a database takes 10 ms., you're limited to 100 queries per second. 100 queries. 26 214 responses. Achieving the speed of 26 214 responses per second requires a great amount of caching and optimization (if the response actually needs to do something useful, like querying a database; "Hello World"-style responses don't qualify).
On my computer, right now, DOMContentLoaded for Google's home page happened 394 ms. after the request was sent. That's less than 3 requests per second. For Programmers.SE home page, it happened 603 ms. after the request was sent. That's not even 2 requests per second. By the way, I have a 100 Mbps internet connection and a fast computer: many users will wait longer.

If the bottleneck is the network speed between the servers, those two sites could literally do thousands of calls to different APIs while serving the page.

Those two cases show that network probably won't be your bottleneck in theory (in practice, you should do the actual benchmarks and profiling to determine the exact location of the bottleneck of your particular system hosted on a particular hardware). The time spent doing the actual work (would it be SQL queries, compression, whatever) and sending the result to the end user is much more important.

Think about databases

Usually, databases are hosted separately from the web application using them. This can raise a concern: what about the connection speed between the server hosting the application and the server hosting the database?

It appears that there are cases where indeed, the connection speed becomes problematic, that is when you store huge amounts of data which don't need to be processed by the database itself and should be available right now (that is large binary files). But such situations are rare: in most cases, the transfer speed is not that big compared to the speed of processing the query itself.

When the transfer speed actually matters is when a company is hosting large data sets on a NAS, and the NAS is accessed by multiple clients at the same time. This is where a SAN can be a solution. This being said, this is not the only solution. Cat 6 cables can support speeds up to 10 Gbps; bonding can also be used to increase the speed without changing the cables or network adapters. Other solutions exist, involving data replication across multiple NAS.

Forget about speed; think about scalability

An important point of a web app is to be able to scale. While the actual performances matter (because nobody wants to pay for more powerful servers), scalability is much more important, because it let you to throw additional hardware when needed.

If you have a not particularly fast app, you'll lose money because you will need more powerful servers.
If you have a fast app which can't scale, you'll lose customers because you won't be able to respond to an increasing demand.

In the same way, virtual machines were a decade ago perceived as a huge performance issue. Indeed, hosting an application on a server vs. hosting it on a virtual machine had an important performance impact. While the gap is much smaller today, it still exists.

Despite this performance loss, virtual environments became very popular because of the flexibility they give.

As with the network speed, you may find that VM is the actual bottleneck and given your actual scale, you will save billions of dollars by hosting your app directly, without the VMs. But this is not what happens for 99.9% of the apps: their bottleneck is somewhere else, and the drawback of a loss of a few microseconds because of the VM is easily compensated by the benefits of hardware abstraction and scalability.

OTHER TIPS

I think you are reading too much into the 'micro' part. It doesn't mean replace every class with a network service, but componentize a monolithic application into sensibly sized components, each one dealing with an aspect of your program. The services won't talk to each other, so at worst you've split a large network request into several smaller ones. The data returned won't be significantly different to what you're receive anyway (though you might return more data and consolidate it in the client)

By structuring your code and resource access such that the resulting system can be flexible enough to run as a monolithic application or a distributed one via configuration. If you abstract away the communication mechanism behind some common interface and you build your system with concurrency in mind, you can easily optimize everything after you have profiled your system and found the real bottle necks.

As a lot of people mentioned, it's not about network bottlenecks. It's more about network brittleness. So the first step is to avoid synchronous communication. It's easier than it sounds. All you need is services with right boundaries. Right boundaries result in services being autonomous, loosely coupled and highly-cohesive. Good service doesn't need information from another service, it already has it. The only way good services communicate is via events. Good services are eventually-consistent as well, so there are no distributed transactions.

The way to achieve this goodness is to identify your business-capabilities first. Business-capability is a specific business-responsibility. Some contribution to overall business-value. So here is my step sequence that I take when think about system boundaries:

Identify higher-level business-responsibilities. There will be a few of them. Treat this services as a steps your organisation should walk through to achieve its business-goal.
Delve deeper within each service. Identify lower-level services services comprising a parent one.
Alongside the first two points think about service communication. They should do it primarily via events, just to notify each other about their business-process result. Events should not be considered as data conveyors.

Keep in mind that business-service includes people, applications, business-processes. Usually only part of it is represented as technical authority.

This could sound a bit abstract, so probably an example of service boundaries identification would be of some interest.

I'd like to add a different perspective, from a different industry with very different assumptions - distributed (entity-level) simulation. Conceptually, this is a lot like a distributed FPS video game. Key differences: all players share some state: where the dragon is right now; no database calls; everything is held in RAM for speed and low latency, throughput is less relevant (but I guess you can't completely ignore it either).

You could think of each participating application either as a monolith (that represents all facets of a player), or as a microservice (that represents just a single player in a crowd).

There has been interest from my colleagues in breaking down a single participating application itself, further into smaller microservices which might be shared, e.g. damage arbitration or line-of-sight computations, things which are usually bundled into the simulations.

The problem is the latency of dispatching calls and waiting for requests. The bandwidth is irrelevant and plentiful anyway, as others have pointed out. But if a line-of-sight computation goes from 1 microsec to 100 microsec (say, due to queuing in the new microservice shared among all player applications), that's a huge loss (might need several or many line-of-sight calculations for each update, several updates/second).

Think very carefully about how services work, when they're called and what data is exchanged. Our applications already don't exchange just position info, they exchange dead reckoning info - I'm at position x, heading in direction y at speed q. And I don't have to update my info until those assumptions change. Many fewer updates, and latency (while still a problem) comes up proportionally less frequently.

So rather than request a service at a fine grain at a higher frequency, try lowering the frequency by:

changing which data are requested and use local computations
sending query or trigger parameters for an asynchronous response
batching requests
anticipating requests and preparing a response in advance, on speculation (opposite of lazy evaluation)
wherever possible, avoid microservices calling other microservices; this compounds the problem, obviously. I understand this is an incentive to make microservices larger and somewhat defeats the point, but microservices are no friend to latency. Maybe just admit it and get over it.

Now remember to check your assumptions about your system. If you're more concerned with throughput than latency, or have don't have shared state, etc., then by all means, use microservices where they make sense. I'm just saying maybe don't use them where they don't make sense.

Your naive imagination is right. And often that doesn't matter. Modern machines are fast. The main advantages of micro service architecture are seen in development and maintenance effort and time.

And of course there is no rule saying you can't use shared memory or even physically deploy multiple services in one executable. Just as long as you design it not to be dependent on that.

Just another factor to add to the current answers. With a coarse-grained services. You want to avoid the latency from all the calls so instead of making 10 calls you make a call that gets 10 pieces of data needed in a DTO.

And remember that microservices are not as micro as people think.

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange