Why can't a server continue to act on a request after sending the response?

https://softwareengineering.stackexchange.com/questions/238959

03-10-2020
|

Question

Say for example I've got a RESTful webservice, and I need to support creating a widget via a POSTed http request. I need to

Deserialize the POSTed widget.
Validate the deserialized widget.
Persist the widget to a backing service.
Send a success response back to the client.

Now, the problem is that my SLA does not allow 3 to block 4--persistence in this case takes too long and the client needs to know right away that the request has succeeded (this is the standard use case for the 202 http status code).

Almost everywhere I look, the assumption is that the way to solve this is to "background" the expensive persistence part. This is typically an awkward process with many moving parts and often its own latency (e.g. making a blocking call to a separate queuing service).

Simply parallelizing 3 and 4 using native constructs is generally out of the question, as is switching the order so that 4 blocks 3. As near as I can tell, this is mainly because the web and app servers are built with the fundamental assumption that a process (and any children it's forked off) is free to be killed/reused as soon as it's sent its response. Or, equivalently, that the response can't be sent until the app has finished doing everything it's going to do.

This is intensely frustrating to me! In any other context, I can do what I like with the program's control flow. But when I'm running a Phusion Passenger -> Ruby on Rails setup, and I want to do a thing after I send the response, I'm left with a wide variety of baroque options that all seem to consider it perfectly natural and acceptable to, say, serialize the application state, post it to Amazon SQS, have basically separate web service that polls SQS, deserializes the old application state, then do the thing. I had application state all set up the way I wanted it after sending the response! Why isn't anything written so I can just do

def create
  widget = Widget.new(params[:widget])
  if widget.valid?
    respond_with(widget)
    widget.save
  else
    respond_with(widget.errors)
  end
end

Why is there a pervasive assumption that web service stacks will never support this kind of flow? Is there a hidden drawback, or tradeoff, to making it possible to do this?

Solution

Now, the problem is that my SLA does not allow 3 to block 4--persistence in this case takes too long and the client needs to know right away that the request has succeeded (this is the standard use case for the 202 http status code).

But until the state change is persisted, you can't know that you're successful. You might have a sudden problem with electrical power and an errant backhoe (this stuff happens!) and then when the service resumes, it's completely forgotten about the resource. The client comes back and the server has no idea what its talking about. That's not good.

No, you need to speed up commits (and avoid unnecessary processing in the critical path).

You may need to think about how you can logically commit faster, about what it means to commit; you can just commit the fact that there is work to do instead of having to require the result of the work, which can be done much more rapidly. Then when the user comes back, either the processing is done in which case you can 301 to the results, or you can give a result that says why things are still processing or that they've failed.

Getting faster commits might mean thinking more carefully about how you deploy. Are you using the right database choice? Is that database hosted on the right hardware? (Commit-heavy loads are much faster when you've got an SSD to host the transaction log on.) I know it's nice to ignore these things and just deal with the data model at an abstract level, but performance is one of those things where the underlying details have a habit of leaking through.

^{In response to comments clarifying…}
If you've got a genuinely expensive task to perform (e.g., you've asked for a collection of large files to be transferred from some third party) you need to stop thinking in terms of having the whole task complete before the response comes back to the user. Instead, make the task itself be a resource: you can then respond quickly to the user to say that the task has started (even if that is a little lie; you might just have queued the fact that you want the task to start) and the user can then query the resource to find out whether it has finished. This is the asynchronous processing model (as opposed to the more common synchronous processing model).

There are many ways to handle letting the user know that the task has finished. By far the simplest is to just wait until they poll and tell them then. Alternatively, you can push a notification somewhere (maybe an Atom feed or by sending an email?) but these are much trickier in general; push notifications are unfortunately relatively easy to abuse. I really advise sticking to polling (or using cleverness with websockets).

The case where a task might be quick or might be slow and the user has no way to know ahead of time is really evil. Unfortunately, the sane way to resolve it is to make the processing model be asynchronous; it's possible to do either way flipping with REST/HTTP (it's either sending a 200 or a 202; the 202's content would include a link to the processing task resource) but it is quite tricky. I don't know if your framework supports such things nicely.

Be aware that most users really do not understand asynchronous processing. They do not understand having to poll for results. They do not appreciate that a server can be doing things other than handling what they asked it to do. This means that if you are serving up an HTML representation, you probably ought to include some Javascript in it to do the polling (or connecting to a websocket, or whatever you choose) so that they can not need to know about page refreshing or details like that. Like that you can go a long way towards pretending that you're just doing what they asked it to do, but without all the problems associated with actual long-running requests.

OTHER TIPS

I think you need something similar to JMS (but in Ruby or whatever you are using, the question was not completely clear).

The idea here is that your front-end web service can quickly pass a message off to another service that says "persist this data." That service will save the message to a persistent queue and return immediately. The web service can return right away without chewing up more server resources.

Then, in the background, a queue processor picks up messages in the queue and dispatches them. In this case, it would dispatch to a persistence service which would take the the raw message and take whatever steps are necessary to save it to its destination.

The difference between saving to a queue and saving to your application's database is this. The queue might use JSON or XML or another interchange format, and it does not care much about the payload. It cares about what service or services need the message and when it was put in the queue. Your persistence service gets the message and cracks open the payload to do whatever domain-specific processing is necessary.

Is there a hidden drawback, or tradeoff, to making it possible to do this?

Yes. The benefit of constraining everything to a request-response cycle is that the framework and web server can automatically free up any resources held up by the request, e.g. memory, network connections, database transaction and database connection, processes, file handles, worker process, etc. Writing high-quality long-lived daemons/servers are very hard, because any resource leaks in a long-lived processes will amplify over time and could bring a server that has been running fine for weeks to a halt for no apparent reason. Tying resources to request-response cycle would allow automatic cleanup not only when your code is in the happy path, but also when it get the rare, unexpected exceptions. This reduces the burden for application developer, instead cleanups can be left to the framework, where they can be done in a systematic, consistent manner.

Also, having requests continue to be processed after the response finishes opens up the possibility that the request will fail after the response is terminated (say, the persistence server has gone missing or it has full disk or your web server runs out of memory and is ungracefully terminated by the kernel's OOM killer), and now since you're outside the request-response cycle, the error message can no longer be delivered to the user. You can certainly engineer your way out of these situations, say by creating a status resource for the user agent to check the status of Accepted but not completed requests, that requires a lot of thoughts and engineering but that's exactly how 202 is meant to be used. Note that 202 does not specify how the status should be checked. It is possible that a 202 response would ask the client to poll the status resource, or to open a Websocket connection to the resource, etc. You're free to do what's best for your situation.

Many resources that could leak in long-lived daemons sometimes just isn't obvious, and writing your application in high-level languages with automatic garbage collectors doesn't really solve all of it.

Going back to your question:

Is there a hidden drawback, or tradeoff, to making it possible to do this?

When you break out of the request-response cycle, you are now on your own. While it is certainly possible to write a server that doesn't follow the request-response cycle, this is extremely hard to do and require lots of engineering. There are a limited number of architectural pattern which can solve the problems of long-lived daemons satisfactorily, request-response is one of the simplest and easiest to use, and simple == good.

There are other architectures which allows you to do work after request-response cycle has finished which works well with request-response paradigm. This usually revolves around message passing. With message passing, after persisting data and just before you terminate the response off, you send a command to worker processes to process the data that has just been submitted, usually through a message queue service. This gives you the advantage of a clean separation between the web server processes which processes short, interactive requests that need to respond message as quickly as possible, from the background workers which takes a long time to finish and is much less sensitive delays (interesting tidbit: many process scheduler algorithms inside kernels have heuristics to detect whether a process looks like an interactive process or a batch-oriented processes and schedules them differently, the former usually gets priority for low latency interrupts, the latter gets longer time slices).

TL;DR. The constraint of request-response cycle is a good thing, try to work with it rather than against it.

the response can't be sent until the app has finished doing everything it's going to do.

That isn't correct. Most decent web framework should also support HTTP chunked Transfer-Encoding which can return partial data to the client while the rest of the data is still being calculated.

nothing is stopping a web service from continuing to process data after a response is sent. Your problem is probably due to the choice of framework forcing you into a simplistic request-response-done interaction.

Your service could start a thread and send it data to work on, or pass it (asynchronously) to another service to work on.

You could implement a websocket and send multiple responses to the client as data becomes available.

The issues you might have are generally down to scalability. What if your background worker takes a long time to process, or every request spins off a thread - you may end up with a lot of background work being done so there is much less resources available for client requests.

I think that the problem here stems from the fact that the client now assumes that widget exists and will be available even though its entirely possible for save to have failed for any number of reasons. This leaves your front and backend in an inconsistent state, also REST is Representational state transfer, if you aren't actually transferring an accurate representation of the state, its not really REST

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange