How to architecture a realtime-heavy websockets-based web application?

https://softwareengineering.stackexchange.com/questions/327644

23-12-2020
|

Pregunta

In the process of developing a realtime Single Page Application, I have progressively adopted websockets to empower my users with up to date data. During this phase, I was sad to notice that I was destroying way too much of my app structure, and I failed to find a solution to this phenomenon.

Before getting into specifics, just a bit of context:

The webapp is a realtime SPA ;
The Backend is in Ruby on Rails. Realtime events are pushed by Ruby to a Redis key, then a micro node server pulls back that and pushes it to Socket.Io ;
The Frontend is in AngularJS, and connects directly to the socket.io server in Node.

On the server side, before realtime I had a clear controller/model based separation of the resources, with processing attached to each. This classical MVC design was completely shredded down, or at least bypassed, right when I started push stuff via websockets to my users. I have now a single pipe where all of my app flows down more or less structured data. And I find it stressful.

On the front end, the main concern is the duplication of business logic. When the user loads the page, I have to load my models trough classical AJAX calls. But I also have to handle realtime data flooding in, and I find myself duplicating much of my client-side business logic to maintain consistency of my client-side models.

After some researches, I can't find any good posts, articles, books or whatever that would give advices about how one can and should design the architecture of a modern webapp with a few specific topics in mind :

How to structure the data that is sent from the server to the user?
- Should I only send events like "this resource has been updated and you should reload it via an AJAX call" or push the updated data and replace previous data loaded via initial AJAX calls?
- How to define a coherent and scalable skeleton to data sent? is this a model update message or "there was an error with blahblahblah" message
How not to send data about everything from anywhere in the backend?
How to reduce the business logic duplication both on the server and the client side?

Solución

How to structure the data that is sent from the server to the user?

Use the messaging pattern. Well, you're already using a messaging protocol, but I mean structure the changes as messages... specifically events. When server side changes, that results in business events. In your scenario, your client views are interested in these events. The events should contain all data relevant to that change (not necessarily all view data). The client page should then update the parts of view it is maintaining with the event data.

For instance, if you were updating a stock ticker and AAPL changed, you wouldn't want to push all stock prices down or even all the data about AAPL (name, description, etc). You would only push AAPL, the delta, and the new price. On the client, you would then update only that stock price on the view.

Should I only send events like "this resource has been updated and you should reload it via an AJAX call" or push the updated data and replace previous data loaded via initial AJAX calls?

I would say neither. If you are sending the event, go ahead and send relevant data with it (not the whole object's data). Give it a name for the kind of event it is. (The naming and what data is relevant to that event is beyond the scope of the mechanical workings of the system. This has more to do with how the business logic is modeled.) Your view updaters need to know how to translate each specific event into a precise view change (i.e. only update what changed).

How to define a coherent and scalable skeleton to data sent? is this a model update message or "there was an error with blahblahblah" message

I would say this is a large, open-ended question that should be broken up into several other questions and posted separately.

In general though, your back end system should create and dispatch events for important happenings to your business. Those could come in from external feeds or from activity in the back-end itself.

How not to send data about everything from anywhere in the backend?

Use the publish/subscribe pattern. When your SPA loads a new page which is interested in receiving real-time updates, the page should subscribe to only those events it can use, and call the view update logic as those events come in. You will probably need pub/sub logic on the server to reduce the network load. Libraries exist for Websocket pub/sub, but I'm not sure what those are in the Rails ecosystem.

How to reduce the business logic duplication both on the server and the client side?

It sounds like you are having to update the view data on both the client and server. My guess is you need the server-side view data so that you have a snapshot to get the real-time client started. Being that there are two languages/platforms involved (Ruby and Javascript), the view update logic will have to be written in both. Aside from transpiling (which has its own issues), I don't see a way around that.

Technical point: Data manipulation (view update) is not business logic. If you mean use case validation, then that seems unavoidable since the client's validations are necessary for good user experience, but cannot ultimately be trusted by the server.

Here is how I see such a thing structured well.

Client Views:

Requests a view snapshot and the view's last seen event number
- This will prepopulate the view so the client doesn't have to build from scratch.
- Could be over HTTP GET for simplicity
Makes a websocket connection and subscribes to specific events, starting from the view's last event number.
Receives events over websocket and updates its view based on event type/data.

Client Commands:

Request data change (HTTP PUT/POST/DELETE)
- Response is only success or failure + error
- (The event(s) generated by the change will come over websocket and trigger a view update.)

The server side could actually be broken up into several components with limited responsibilities. One that just processes the incoming requests and creates events. Another could manage client subscriptions, listen for events (say in-process) and forward appropriate events to subscribers. You could have a third that listens for events and updates server-side views -- maybe this even happens before subscribers receive the events.

What I have described is a form of CQRS + Messaging, and a typical strategy to address of the kind of issues you are facing.

I didn't bring Event Sourcing into this description as I'm not sure if it's something you want to take on or if you need it necessarily. But it is a related pattern.

Otros consejos

After a few months of work on the backend mainly, I have been able to use some of the advices here to address the problems the platform was facing.

The main objective when rethinking the backend was to stick as hard as possible to CRUD. All the actions, messages and requests scattered around many routes were regrouped into resources that are created, updated, read or deleted. It sounds obvious now, but this has been a very difficult way of thinking to apply carefully.

After everything has been organised into resources, I have been able to attach realtime messages to models.

Creation triggers a message with hole new resource;
Update triggers a message with only the updated attributes (plus the UUID);
Deletion triggers a deletion message.

On the Rest API, all create, update, delete methods generate a head only response, the HTTP code informing of the success or failure and the actual data being pushed over websockets.

On the front end, each resources are handled by a specific component that loads them trough HTTP on initialisation, then subscribes for updates and maintain their state over time. Views then bind to theses components to display resources and perform actions on those resources trough the same components.

I found the CQRS + Messaging and Event Sourcing reads very interesting, but felt it was a bit overcomplicated for my problem and is maybe more adapted to intensive applications where committing data into a centralised database is an expensive luxury. But I will definitely keep in mind this approach.

In this case, the app will have few concurrent clients and I took the party of relying a lot on the database. The most changing models are stored into Redis which I trust to handle a few hundred updates per sec.

Licenciado bajo: CC-BY-SA con atribución

No afiliado a softwareengineering.stackexchange