Question

I am trying to use boost::thread_group to manage my threads. The design was so that each thread in the thread-group invokes a sequence of functors of struct A.

Pseudocode:

struct A {
    int n;
    vector p;

    void operator()() {
        for(number_of_steps) // Do computations involving members n, p, x and y.
    }
private:
    float x;
    vector y;
};

struct parallel_A : boost::thread_group {
    parallel_A(const A* a) : m_a(a) {
        for(number_of_cpu) {
            create_thread(inner_struct(this));
        }
    }

    void run() {
        (*m_a)();
    }
private:
    struct inner_struct {
        parallel_A* a;

        inner_struct(parallel_A* _a) : a(_a) {}

        void operator()() {
            a->run(); 
        }
    }
    const A* m_a;
}

My question is:

  1. Will the data variables n, p, x and y and the computation in object A, be interleaved by the threads?

  2. If we were to go further by having more calls to functor A for each CPU, for example 1 thread for 1 CPU and for each thread 4 more invocation of functor A to do the computation, what will be the behaviour in terms of the state of the variables and computation of A?

Was it helpful?

Solution

  1. Based on the code:

    for(number_of_cpu) {
        create_thread(inner_struct(this));
    

    the same value of the this pointer will be passed to all threads and thus the threads will share the same n, p, x and y data variables. The computations of A will interleave in any case (except possibly for critical sections) but now since the computations share the same data variables, it is highly likely that one computation will use intermediate values meant for another computation resulting in data corruption.

    I suggest that some form of thread-local storage be used here either by defining an array of A objects and/or using a formal mechanism such as boost::thread_specific_ptr.

  2. If thread-local storage is not used (i.e. the above code is mantained as it is), adding more invocations of functor A will increase the chances of data corruption.

    If thread-local storage is used, since instructions are still executed sequentially within a thread, adding 4 more invocations of functor A in each thread will mean that the computation will take approximately 5 times as long. This assumes that no subthreads are being created within each thread to handle the additional invocations.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top