What is the best way to run multiple instances of a web app that requires state to be transferred between instances?

https://softwareengineering.stackexchange.com/questions/399168

02-03-2021
|

Pregunta

I'm working on a web app where users enter a "room." The state of these rooms is loaded in the server's memory at all times, and is synchronized with the clients via websockets. What's the best way to handle rooms when there are multiple instances of my web app?

Should I...

A. Have some kind of load balancer that knows which instances are handling which rooms and directs connections accordingly?

B. Have the web app instances communicate with each other internally, so that all instances that have clients connected to a room are synchronized?

C. Something else?

Solución

The state of these rooms is loaded in the server's memory at all times

If you store state in the server's memory, in a real production environment you will lose this state every now and then. Is this acceptable? (it normally isn't):

If it is acceptable to lose the state and you can manage the partition of rooms per server so that the load on each server is manageable, you can use your solution A. But note that it might not be simple to manage the load depending on how your system works. For example, you might put rooms 1 to 10 in Server A, but room 4 has a lot of load, so you'll have to move it to another server. It can get very complex to manage.

On the other hand, if losing the state is not acceptable, you will need some sort of safe persistence for the rooms state. Some options would be:

A database: you mention that this would be inefficient, but you would double check this point, just in case you missed a way of doing which might work.
REDIS, as mentioned by Hans-Martin.
Some sort of replicated in memory database (i.e. Service Fabric Reliable Collections). It's as fast as storing it in memory, but replicated in multiple replicas, so if a server goes down, your data doesn't get lost and the cluster self recovers. BTW, depending on your concurrency model for the Rooms state update, you could use the Actor model.

Options 1 and 2 are the simplest approaches, as the servers are stateless, you can scale the solution very easily to manage load, without having to worry about data/load partitioning. You could also combine solutions 1 and 2, to get very reliable long term data persistence with the database and fast reads with Redis.

Option 3 is a lot more complex to implement (you'll also need to manage partitioning), but it's probably the one that would give you the best performance with high reliability.

Otros consejos

Don't know if "best", but "possible": use a shared REDIS. You need to map your room state and operations to state and operations supported by REDIS, of course.

If your servers are geographically dispersed, you need to decide how to handle this. Some options are:

use a single REDIS instance and live with the network latency
use replicated REDIS instances to improve read performance. Writes still need to address the master instance.
let each server cluster have its own REDIS instance handling a subset of the rooms, and redirect clients to the appropriate server cluster

Licenciado bajo: CC-BY-SA con atribución

No afiliado a softwareengineering.stackexchange