Boost Asio io_service destructor hangs on OS X

https://stackoverflow.com/questions/21835749

12-10-2022
|

Domanda

I have a problem with Boost Asio on OS X, where the io_service destructor sometimes hangs indefinitely. I have a relatively simple repro case:

#include <boost/asio.hpp>
#include <boost/thread.hpp>

int main(int argc, char* argv[]) {
    timeval tv;
    gettimeofday(&tv, 0);
    std::time_t t = tv.tv_sec;
    std::tm curr;
    // The call to gmtime_r _seems_ innocent, but I cannot reproduce without this
    std::tm* curr_ptr = gmtime_r(&t, &curr);

    {
        boost::asio::io_service ioService;
        boost::asio::deadline_timer timer(ioService);

        ioService.post([&](){
            // This will also call gmtime_r, but just calling that is not enough
            timer.expires_from_now(boost::posix_time::milliseconds(1));
            timer.async_wait([](const boost::system::error_code &) {});
        });
        ioService.post([&](){
            ioService.post([&](){});
        });

        // Run some threads
        boost::thread_group workers;
        for (auto i=0; i<3; ++i) {
            workers.create_thread([&](){ ioService.run(); });
        }
        workers.join_all();
    } // hangs here in the io_service destructor
    return 0;
}

Basically, this just posts two handlers on the queue, one of which schedules a timer and the other just posts another handler. Sometimes this simple program causes the io_service destructor to hang indefinitely, in particular in the pipe_select_interrupter destructor during the kqueue_reactor destruction. This blocks in the system call close() on the pipe read descriptor.

To trigger the error I invoke the program in a loop using a shell script (but it is possible to trigger using a loop in the example above as well):

#!/bin/csh
set yname="foo"
while ( $yname != "" )
    date
    ./hangtest
end

I am no longer able to reproduce if I:

Remove the call to gmtime_r() in the beginning (!). Edit: This only appears to apply if I run using the script. If I instead add a loop in the program itself I can reproduce it without that call as well, as per the comment by ruslo.
Remove the call to async_wait() on the timer in the handler or move the timer setup outside of the handler.
Remove the post() in the second handler.
Lower the number of threads.
Place a mutex in kqueue_reactor::interrupt(). This function is invoked from both the async_wait() and the post(), and calls kevent() with the read descriptor that is then not possible to close.

Am I doing something wrong in the above code?

I am running on OS X 10.8.5 with Boost 1.54 and compiling with clang -stdlib=libc++ -std=c++11. I can also reproduce with Boost Asio from Boost 1.55 (with the rest of Boost 1.54 kept as-is).

Edit: I can reproduce on OS X 10.9.1 as well (using the same executable).

Soluzione

The fix for this was committed to Asio in the master branch April 29th 2014

https://github.com/chriskohlhoff/asio/commit/3f473548a7d71012a77cd256a61034a505696958

Fix occasional close() system call hang on MacOS.

Repeated re-registration of kqueue event filters seems to behave as though there is some kind of "leak" on MacOS, culminating in a suspended close() system call and an unkillable process. To avoid this, we will register a descriptor's kqueue event filters once only i.e. when the descriptor is first created.

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow