Discovering blocking erlang threads

https://stackoverflow.com/questions/10487074

06-06-2021
|

Question

I have a project which has lots of modules, each one has different running threads. I wrote a little script which goes through each one and safely reloads the code (for hot swaps):

reload_all() ->                
    ?MODULE:reload_all(?MODULE_LIST).
reload_all([]) -> ok;          
reload_all([T|C]) ->
    io:fwrite("Purging ~w\n",[T]),  
    try_purge(T),              
    {module,T} = code:load_file(T), 
    ?MODULE:reload_all(C).     

try_purge(T) -> try_purge(T,1).
try_purge(T,Wait) ->           
    case code:soft_purge(T) of 
    true -> ok;
    false ->
        io:fwrite("* Waiting ~w seconds for ~w module\n",[Wait,T]),
        timer:sleep(Wait*1000),
        try_purge(T,Wait+1)    
    end.

It uses the soft_purge() function which only purges the code if there are no threads running the "old" code that would be killed by the normal purge command. It will wait in increasing intervals and keep trying. I've designed the project so that the wait should never be more then a minute total, but realistically it should always be more or less instant.

The problem I'm running into is that sometimes a module will have a bug causing it to block indefinitely for one reason or another, and my reload_all() script never completes. This is the desired behavior, it lets me know that something is wrong. The problem is that to track down the bug involves lots and lots of testing and analyzing of the code, which sometimes doesn't even work because the bug only shows up in the production environment and not in the testing one.

My question is: Is there a way to identify which threads are running the "old" code in a module, and see which function they are currently stuck in?

Solution

You can check if you are using the old or the new version of the module using erlang:check_old_code/1 and erlang:check_process_code/2. Just see Erlang manual.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow