Re-architecting CPU intensive Node application to handle multiple users

https://softwareengineering.stackexchange.com/questions/420364

20-03-2021
|

Question

A few years ago, I wrote an application that allowed users to upload a file (it's a very specific file type) to a server. It then sends instructions to the server on how to visualise the data, and the server presents the data as either a dot plot graph, or a histogram graph. The PNG is constructed on the server side, and is streamed back to the user. Here's an example of 2 graphs produced by my application:

The technology I used was NodeJS and MongoDB. I'm now revisiting the application as I've received many requests from users to add new features, and some complaints about how slow it is to use.

There are a few issues with the stack. When a user wants to get a graph, a HTTP GET request is made to the server. Within that GET request, there is a lot of heavy computational work. The data is looped through, and the graph is constructed by figuring out the position of each data point on the graph. The most computationally heavy part is figuring out the position on the data-point on the graph. I won't go into it here, but each data point is run through a long formula to figure out the correct position (and for some files there can be over 1,000,000 data points). While all of this is happening (within the GET request), because I'm using Node, all other requests are now blocked (so my application can only handle one request at a time).

I'm looking for suggestions on how to re-architecture it to handle multiple requests concurrently and to draw the graphs quicker (presumably this is all about increasing CPU power). One high-level approach I'm thinking of is:

User makes a request for a graph
Node takes the request and notifies an AWS Lambda function (and HTTP request ends here) which does all the heavy computation work and produces a PNG
PNG is streamed back to the user (this I'm not sure how to do).

All of this has to happen within a few seconds - the user is waiting for a graph to appear. I'm not sure if my suggested approach would be very user friendly as AWS Lambda needs time to start-up, and the user may be waiting around a long time for the graph.

Any suggestions would be greatly appreciated.

Solution

Nodejs is good for IO intensive tasks but may not be good for CPU intensive tasks. The reason is that it runs on the event loop, which runs on a single thread.

There may be several possible optimizations on the approach to address this issue:

One server

As others mentioned, you'd definitely run the CPU heavy task separately from request handling part.

Cache the request image if a group of graphs are requested very often.
Run the image computation job in another process with the child_process() module. You may create multiple processes but you need to benchmark the "best" number. And if you have multiple processes, you may use a pool to handle the child processes by assigning the task to the process.
To make each process much more productive on multi-core system, webworker-threads provides an asynchronous API for CPU-bound tasks that's missing in Nodejs, which could be the next module you'd consider make use of.
To accelerate the image computations, you may consider GPU library to parallelize the computation.
You may further separate the tasks by splitting your application into two --- one (service_1) for handling the request and polling for the response image, and another one (service_2) is just handling the computations. Using AWS Lambda is fine but if you handle the computations inside the lambda handler, it may be limited by resources (RAM, CPU). An alternative is to use AWS SQS, and service_2 is picking the request message from the queue and compute the image. After the computation is done, the image data can be stored in a database (Redis for example, with an expire strategy setting), then send notifications with SQS again. Service_1 subscribes the specific topic on the SQS (or polls the messages from the queue periodically if the RPS is not too high), load the image data from database, and render the image.

Multiple servers

It is worth profiling on a single server with a load test, to better understand the concurrent request it can handles. If you have to handle more requests than your benchmark indicates, you then have to add more instances and deploy your application to all the servers, with a load balancer to forward the request to one server.

OTHER TIPS

You definitely want to separate the computationally intensive part into it's own service. I would even separate it from the part that generates the PNG image. The important part here is to make the process asynchronous. While the numbers are being crunched, serve a PNG image that basically says "please wait." Or return an HTTP message with a 202 Accepted response.

NodeJS might not be the best technology choice for the CPU intensive parts. Since The process is visual in nature, consider choosing a technology that scales well with additional CPUs, or even GPUs if available. Make spinning up new threads to handle multiple users as quick as possible.

Once the raw data for the graph is ready, push it to another service that creates the PNG file.

When in doubt, separate the I/O and CPU intensive tasks from the code that returns data to the user. Plan for eventual consistency. Choose the technology that best fits the task, and can be scaled well.

The graph should be rendered in a separate process. Your response to the GET request should now look like this:

If the requested graph is in the cache, serve that cached graph image.
Otherwise, check if the job queue contains a job to generate that image. If not, add such a job to the job queue.
Serve some kind of "please wait" content.

All the task in the job queue needs to do is generate the image and place it in the cache.

All this has to happen within a few seconds - the user is waiting for a graph to appear. I'm not sure that my suggested approach would be very user friendly as AWS Lambda needs time to start-up and the user may be waiting around a long time for the graph.

It's always been the case that the user has had to wait for their graph to be generated. Generating it in a separate process will inevitably add some overhead. But there is no reason that overhead needs to be large, and the alternative would be to make other users wait even longer to see their graphs.

Assuming the graph image is displayed in some web page that you also have control over, the page can contain some JavaScript that periodically requests this GET resource via XHR, displaying only "please wait, generating" until the final finished image is available.

Once the graph is being generated in a separate process, you can work on making an optimized version of that process that generates the graph faster, perhaps in a different programming language more suited to numerical computing.

I'm looking for suggestions on how to re-architecture it to handle multiple requests concurrently and to draw the graphs quicker (presumably this is all about increasing CPU power).

Other than making your application faster the straight forward way to handle more requests is by adding more workers (instance of your nodejs application) on more machines. Basically putting a load balancer to handle all the requests which will then forward the requests to the worker backends.

If the problem was really about too many requests, then that should allow the system scale nicely.

Does it really need to be a lambda function? Assuming you're currently running your web app on one server, and the server has more than one core, you can start a separate thread or a child process to do the calculations. If you need more than one server, it gets more complicated, but if you don't, then why overcomplicate it?

I'd probably make it a child process, because that creates the maximum amount of decoupling. You could even rewrite the calculation program in a different language, then. Essentially, your NodeJS web server would run a command in the background, and send the result back to the user when it's done. While it's running, your web server can still process other web requests - you get a callback when it's done.

Using separate threads or processes will automatically allow you to use all your server's CPU cores - one per calculation.

If your calculation uses a lot of memory and you might get lots of calculations at the same time, you might want to make a queue inside your web server, instead of trying to run them all at once, so you don't run out of memory.

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange