Question

I have a rare heisenbug in a multi-threaded application where the main thread, and only this thread, will just do nothing. As it's an heisenbug it's really hard to understand why this is happening.

The main thread is basically just looping. In the loop, it check several concurrent priority queues which contain tasks ordered by time to be executed. It pop a task, see if it's time to execute it. If it's time, it will just schedule it into TBB's task scheduler (using a root task which is the parent of all other tasks). If it's not time, the task is pushed again in the priority queue. That's for one cycle. At the end of the cycle, the main thread is put to sleep for a very short time that I expect will be longer in practice but it's not really a problem, I just don't want it to use too much resources when not necessary.

Litterally:

    static const auto TIME_SCHEDULED_TASKS_SPAWN_FREQUENCY = microseconds(250);


    while( !m_task_scheduler.is_exiting() ) // check if the application should exit
    {
        m_clock_scheduler.spawn_realtime_tasks(); // here we spawn tasks if it's time 
        this_thread::sleep_for( TIME_SCHEDULED_TASKS_SPAWN_FREQUENCY );
    }

    m_clock_scheduler.clear_tasks();
    m_root_task.wait_for_all();

I have a special task that just log a "TICK" message each second. It is automatically rescheduling until the end of the program. However, when the heisenbug appear, I can see the "TICK" disappearing and the application not doing anything else than the work that occurs in non-main threads. So it appear that only the main thread is touched.

The problem can come from different places, maybe in the scheduling logic, but then it's also the only thread that have a sleep call. That sleep is a boost::this_thread::sleep_for().

My question is: Is it possible that Windows (7 64bit) consider the main thread to be sleeping often and decide that it should sleep for a longer period of time than asked or be definitely ended?

I expect that it is not possible but I would like to be sure. I didn't find any precision on this in online documentation so far.


Update:

I have a friend who can reproduce the bug systematically (on Windows Vista, Core 2 Duo). I sent him a version without sleep and another with the main loop reimplemented using condition_variable so that each time a task is pushed in the queue the condition_variable awaken the main thread (but still have a minimum time of spawning).

The version without sleep works (but is slower) - so the problem seem to be related even if I don't know the real source.

The version using condition_variable works - which would indicate that it's the sleep call that don't work correctly?

So, apparently I fixed the bug, but I still don't know why the specific sleep call can sometime block.


UPDATE:

It was actually a bug triggered by Boost code. I hunted the bug and reported it and it have been fixed. I didn't check the previous versions but it is fixed in Boost 1.55

Was it helpful?

Solution

Is it possible that Windows (7 64bit) consider the main thread to be sleeping often and decide that it should sleep for a longer period of time than asked or be definitely ended?

NO. This does not happen. MSDN does not indicate that this could happen. Empirically, I have many Windows apps with periodic intervals ranging from ms to hours. The effect you suggest does not happen - it would be disastrous for my apps.

Given the well-known granularity issues with Sleep() calls for very short intervals, a sleeping thread will become ready upon the expiry of the interval. If there is a CPU core available, (ie. the cores are not all in use running higher-priority threads), the newly-ready thread will become running.

The OS will not extend the interval of Sleep() because of any historical/statistical data associated with the thread states - I don't think it keeps any such data.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top