Erlang supervisor. Restarting process, if it fails several times, give up and send message

https://stackoverflow.com/questions/9854010

26-05-2021
|

Question

I have several gen_server workers periodically requesting some information from hardware sensors. Sensors may temporary fail, it is normal. If sensor fails worker terminates with an exception.

All workers are spawned form supervisor with simple_one_to_one strategy. Also I have a control gen_server, which can start and stop workers and also recives 'DOWN' messages.

So now I have two problems:

If worker is restarted by supervisor its state is lost, which is not acceptable to me. I need to recreate the worker with the same state.
If the worker is failing several times in period of time something serious has happened with the sensors and it requires the operator's attention. Thus I need to give up restarting the worker and send a message to event handlers. But the default behaviuor of supervisor is terminate after exhaust process restart limit.

I see two solutions:

Set the type of the processes in the supervisor as temporary and control them and restart them in control gen_server. But this is exactly what supervisor should do, so I'm reinventing the wheel.
Create a supervisor for each worker under the main supervisor. This exactly solves my second problem, but the state of workers is lost after restart, thus I need some storage like ets table storing the states of workers.

I am very new to Erlang, so I need some advice to my problem, as to which (if any) solution is the best. Thanks in advance.

Solution

If worker is restarted by supervisor its state is lost, which is not accertable to me. I need to recreate worker with the same state.

If you need the process state to persist the process lifecycle, you need to store it elsewhere, for example in an ETS table.

If the worker is failing several times in particular amount of time something serious happened with sensors and it require operator's attention. Thus I need to give up restarting worker and send some message for event handlers. But default behaviuor of supervisor is terminate after exhaust process restart limit.

Correct. Generally speaking, the less logic you put into your supervisor, the better it is. Supervisors should just supervise child processes and that's it. But you could still monitor your supervisor and be notified whenever your supervisor gave up (just an idea). This way you can avoid re-invent the wheel and use the supervisor to manage the children.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow