Resolving a dead lock between two gen_tcp

https://stackoverflow.com/questions/6036969

14-11-2019
|

Question

While browsing the code of an erlang application, I came across an interesting design problem. Let me describe the situation, but I can't post any code because of PIA sorry.

The code is structured as an OTP application in which two gen_server modules are responsible for allocating some kind of resources. The application runs perfectly for some time and we didn't really had big issues.

The tricky part begins when one the first gen_server need to check if the second have enough resources left. A call is issued to the second gen_server that itself call a utility library that (in very very special case) issue a call to the first gen_server.

I'm relatively new to erlang but I think that this situation is going to make the two gen_server wait for each other.

This is probably a design problem but I just wanted to know if there is any special mechanism built into OTP that can prevent this kind of "hangs".

Any help would be appreciated.

EDIT : To summaries the answers : If you have a situation where two gen_servers call each other in a cyclic way you'd better spend some more time in the application design.

Thanks for your help :)

Solution

This is called a deadlock and could/should be avoided at a design level. Below is a possible workaround and some subjective points that hopefully helps you avoid doing a mistake.

While there are ways to work around your problem, "waiting" is exactly what the call is doing.

One possible work around would be to spawn a process from inside A which calls B, but does not block A from handling the call from B. This process would reply directly to the caller.

In server A:

handle_call(do_spaghetti_call, From, State) ->
    spawn(fun() -> gen_server:reply(From, call_server_B(more_spaghetti)) end),
    {noreply, State};
handle_call(spaghetti_callback, _From, State) ->
    {reply, foobar, State}

In server B:

handle_call(more_spaghetti, _From, State) ->
    {reply, gen_server:call(server_a, spaghetti_callback), State}

For me this is very complex and superhard to reason about. I think you even could call it spaghetti code without offending anyone.

On another note, while the above might solve your problem, you should think hard about what calling like this actually implies. For example, what happens if server A executes this call many times? What happens if at any point there is a timeout? How do you configure the timeouts so they make sense? (The innermost call must have a shorter timeout than the outer calls, etc).

I would change the design, even if it is painful, because when you allow this to exist and work around it, your system becomes very hard to reason about. IMHO, complexity is the root of all evil and should be avoided at all costs.

OTHER TIPS

It is mostly a design issue where you need to make sure that there are no long blocking calls from gen_server1. This can quite easily be done by spawning a small fun which takes care of your call to gen_server2 and the delivers the result to gen_server1 when done.

You would have to keep track of the fact that gen_server1 is waiting for a response from gen_server2. Something like this maybe:

handle_call(Msg, From, S) ->
  Self = self(),
  spawn(fun() ->
    Res = gen_server:call(gen_server2, Msg),
    gen_server:cast(Self, {reply,Res})
  end),
{noreply, S#state{ from = From }}.

handle_cast({reply, Res}, S = #state{ from = From }) ->
  gen_server:reply(From, Res),
  {noreply, S#state{ from = undefiend}.

This way gen_server1 can serve requests from gen_server2 without hanging. You would ofcourse also need to do proper error propagation of the small process, but you get the general idea.

Another way of doing it, which I think is better, is to make this (resource) information passing asynchronous. Each server reacts and does what it is supposed to when it gets an (asynchronous) my_resource_state message from the other server. It can also prompt the other server to send its resource state with an send_me_your_resource_state asynchronous message. As both these messages are asynchronous they will never block and a server can process other requests while it is waiting for a my_resource_state message from the other server after prompting it.

Another benefit of having the message asynchronous is that servers can send off this information without being prompted when they feel it is necessary, for example "help me I am running really low!" or "I am overflowing, do you want some?".

The two replies from @Lukas and @knutin actually do do it asynchronously, but they do it by a spawning a temporary process, which can then do synchronous calls without blocking the servers. It is easier to use asynchronous messages straight off, and clearer in intent as well.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow