Question

I have a problem with Boost Asio on OS X, where the io_service destructor sometimes hangs indefinitely. I have a relatively simple repro case:

#include <boost/asio.hpp>
#include <boost/thread.hpp>

int main(int argc, char* argv[]) {
    timeval tv;
    gettimeofday(&tv, 0);
    std::time_t t = tv.tv_sec;
    std::tm curr;
    // The call to gmtime_r _seems_ innocent, but I cannot reproduce without this
    std::tm* curr_ptr = gmtime_r(&t, &curr);

    {
        boost::asio::io_service ioService;
        boost::asio::deadline_timer timer(ioService);

        ioService.post([&](){
            // This will also call gmtime_r, but just calling that is not enough
            timer.expires_from_now(boost::posix_time::milliseconds(1));
            timer.async_wait([](const boost::system::error_code &) {});
        });
        ioService.post([&](){
            ioService.post([&](){});
        });

        // Run some threads
        boost::thread_group workers;
        for (auto i=0; i<3; ++i) {
            workers.create_thread([&](){ ioService.run(); });
        }
        workers.join_all();
    } // hangs here in the io_service destructor
    return 0;
}

Basically, this just posts two handlers on the queue, one of which schedules a timer and the other just posts another handler. Sometimes this simple program causes the io_service destructor to hang indefinitely, in particular in the pipe_select_interrupter destructor during the kqueue_reactor destruction. This blocks in the system call close() on the pipe read descriptor.

To trigger the error I invoke the program in a loop using a shell script (but it is possible to trigger using a loop in the example above as well):

#!/bin/csh
set yname="foo"
while ( $yname != "" )
    date
    ./hangtest
end

I am no longer able to reproduce if I:

  • Remove the call to gmtime_r() in the beginning (!). Edit: This only appears to apply if I run using the script. If I instead add a loop in the program itself I can reproduce it without that call as well, as per the comment by ruslo.
  • Remove the call to async_wait() on the timer in the handler or move the timer setup outside of the handler.
  • Remove the post() in the second handler.
  • Lower the number of threads.
  • Place a mutex in kqueue_reactor::interrupt(). This function is invoked from both the async_wait() and the post(), and calls kevent() with the read descriptor that is then not possible to close.

Am I doing something wrong in the above code?

I am running on OS X 10.8.5 with Boost 1.54 and compiling with clang -stdlib=libc++ -std=c++11. I can also reproduce with Boost Asio from Boost 1.55 (with the rest of Boost 1.54 kept as-is).

Edit: I can reproduce on OS X 10.9.1 as well (using the same executable).

Was it helpful?

Solution

The fix for this was committed to Asio in the master branch April 29th 2014

Fix occasional close() system call hang on MacOS.

Repeated re-registration of kqueue event filters seems to behave as though there is some kind of "leak" on MacOS, culminating in a suspended close() system call and an unkillable process. To avoid this, we will register a descriptor's kqueue event filters once only i.e. when the descriptor is first created.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top