Question

I have to make a multithreaded program(solve a system of equations with method of rotation). My program giving the right answer. But it runs more slowly when i create more threads. Would anyone be able to help me with this? Part of my code:

typedef struct DATA
 {
double *a; 
int n;
int num_thr; 
int total_thr;
int num_row1;
int num_row2;
double cos;
double sin; 
 }  DATA;


 void synchronize(int total_threads)
  {
static pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
static pthread_cond_t condvar_in = PTHREAD_COND_INITIALIZER;
static pthread_cond_t condvar_out = PTHREAD_COND_INITIALIZER;
static int threads_in = 0;
static int threads_out = 0;

pthread_mutex_lock(&mutex);

threads_in++;
if (threads_in >= total_threads)
{
    threads_out = 0;
    pthread_cond_broadcast(&condvar_in);
} else
    while (threads_in < total_threads)
        pthread_cond_wait(&condvar_in,&mutex);

threads_out++;
if (threads_out >= total_threads)
{
    threads_in = 0;
    pthread_cond_broadcast(&condvar_out);
} else
    while (threads_out < total_threads)
        pthread_cond_wait(&condvar_out,&mutex);

pthread_mutex_unlock(&mutex);
 }

void rotation (double *a,int n, int num_thr,int total_thr,int num_row1,int num_row2,double cos,double sin)
{
int k;
double m;
int first;

first=n-1-num_thr;
for (k=first;k>=num_row1;k=k-total_thr)
{
    m=a[num_row1*n+k];
    a[num_row1*n+k]=cos*a[num_row1*n+k]+sin*a[num_row2*n+k];
    a[num_row2*n+k]=-sin*m+cos*a[num_row2*n+k];

}
    synchronize (total_thr);


 }
void * rotation_threaded(void *pa)
 {

DATA *data=(DATA*)pa ;
rotation(data->a,data->n,data->num_thr,data->total_thr,data->num_row1,data->num_row2,data->cos,data->sin);
return 0;
 }



int main(int argc, char * argv[])
 {
................


    for(i=0;i<n;i++)
{
    for(j=i+1;j<n;j++)
    {
        n1=a[j*n+i];
            m=a[i*n+i];

            cos=m/sqrt(m*m+n1*n1);
            sin=n1/sqrt(m*m+n1*n1);
            for (t=0;t<total_thr;t++)
            {
                data[t].n=n;
                data[t].a=a;
                data[t].total_thr=total_thr;
                data[t].num_thr=t;
                data[t].num_row1=i;
                data[t].num_row2=j;
                data[t].cos=cos;
                data[t].sin=sin;
            }

            for (k=0;k<total_thr;k++)
            {
                if (pthread_create (threads+k,0,rotation_threaded,data+k))                  {
                    printf (" Couldn't create %d thread",k);
                    return 3;
                }

            }
            for (k=0;k<total_thr;k++)
            {

                if (pthread_join (threads[k],0))
                printf ("Mistake %d \n",k);
            }
            h=b[i];
            b[i]=cos*b[i]+sin*b[j];
            b[j]=-sin*h+cos*b[j];
    } 
}

..............
  }

No correct solution

OTHER TIPS

To answer the specific question: Because your threads spend more time trying to obtain a lock and wait on condition variables than they are doing actual work. Multithreading isn't a free get more power scheme, if you have to constantly acquire highly contended locks you will get a severe overhead penalty for this. The more threads you have, the more they fight over the locks and spend blocked while one other thread holds the lock.

To combat this: Try to only synchronize data when you have to. Queue up a lot of changes, and/or do more work at once to actually leverage the threads time on the CPU. When you synchronize, try to only hold the lock for the shortest absolute necessary time.

Last but not least: More threads may not always be better. If you have multiple threads crunching on a queue of jobs, it's often better to only spin up as many threads as there are logical CPU cores so the threads don't have to fight over the single cores. As with everything though, proper profiling will tell you where the problems are.

From what I can see, you recreate your threads for each (i,j) pair, then you wait for all threads to finish.

It would probably make more sense to create your threads at the beginning and have them wait on a condition. The threads could then be reused.

You also seem to be copying a lot of information that is constant for each thread on each iteration (that's probably not the reason for the slowdown but why not make it clear what is variable). The only information in data that differs for each thread is num_thr. The values of n and a never changes and the values of cos, sin, i and j could be saved outside the for-t loop.

What's the use of the synchronize method. It seems to wait for all threads to pass the threads_in "barrier" then to pass the threads_out "barrier" but what is it protecting?

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top