When you say pthread_join
immediately after pthread_create
, you're effectively serializing all the threads. Don't join threads until after you've created all the threads and done all the other work that doesn't need the result from the threaded computations.
C - pthreads appear to only be utilizing one core
-
11-10-2022 - |
سؤال
Let me first of all say that this is for school but I don't really need help, I'm just confused by some results I'm getting.
I have a simple program that approximates pi using Simpson's rule, in one assignment we had to do this by spawning 4 child processes and now in this assignment we have to use 4 kernel-level threads. I've done this, but when I time the programs the one using child processes seems to run faster (I get the impression I should be seeing the opposite result).
Here is the program using pthreads:
#include <stdio.h>
#include <unistd.h>
#include <pthread.h>
#include <stdlib.h>
// This complicated ternary statement does the bulk of our work.
// Basically depending on whether or not we're at an even number in our
// sequence we'll call the function with x/32000 multiplied by 2 or 4.
#define TERN_STMT(x) (((int)x%2==0)?2*func(x/32000):4*func(x/32000))
// Set to 0 for no 100,000 runs
#define SPEED_TEST 1
struct func_range {
double start;
double end;
};
// The function defined in the assignment
double func(double x)
{
return 4 / (1 + x*x);
}
void *partial_sum(void *r)
{
double *ret = (double *)malloc(sizeof(double));
struct func_range *range = r;
#if SPEED_TEST
int k;
double begin = range->start;
for (k = 0; k < 25000; k++)
{
range->start = begin;
*ret = 0;
#endif
for (; range->start <= range->end; ++range->start)
*ret += TERN_STMT(range->start);
#if SPEED_TEST
}
#endif
return ret;
}
int main()
{
// An array for our threads.
pthread_t threads[4];
double total_sum = func(0);
void *temp;
struct func_range our_range;
int i;
for (i = 0; i < 4; i++)
{
our_range.start = (i == 0) ? 1 : (i == 1) ? 8000 : (i == 2) ? 16000 : 24000;
our_range.end = (i == 0) ? 7999 : (i == 1) ? 15999 : (i == 2) ? 23999 : 31999;
pthread_create(&threads[i], NULL, &partial_sum, &our_range);
pthread_join(threads[i], &temp);
total_sum += *(double *)temp;
free(temp);
}
total_sum += func(1);
// Final calculations
total_sum /= 3.0;
total_sum *= (1.0/32000.0);
// Print our result
printf("%f\n", total_sum);
return EXIT_SUCCESS;
}
Here is using child processes:
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
// This complicated ternary statement does the bulk of our work.
// Basically depending on whether or not we're at an even number in our
// sequence we'll call the function with x/32000 multiplied by 2 or 4.
#define TERN_STMT(x) (((int)x%2==0)?2*func(x/32000):4*func(x/32000))
// Set to 0 for no 100,000 runs
#define SPEED_TEST 1
// The function defined in the assignment
double func(double x)
{
return 4 / (1 + x*x);
}
int main()
{
// An array for our subprocesses.
pid_t pids[4];
// The pipe to pass-through information
int mypipe[2];
// Counter for subproccess loops
double j;
// Counter for outer loop
int i;
// Number of PIDs
int n = 4;
// The final sum
double total_sum = 0;
// Temporary variable holding the result from a subproccess
double temp;
// The partial sum tallied by a subproccess.
double sum = 0;
int k;
if (pipe(mypipe))
{
perror("pipe");
return EXIT_FAILURE;
}
// Create the PIDs
for (i = 0; i < 4; i++)
{
// Abort if something went wrong
if ((pids[i] = fork()) < 0)
{
perror("fork");
abort();
}
else if (pids[i] == 0)
// Depending on what PID number we are we'll only calculate
// 1/4 the total.
#if SPEED_TEST
for (k = 0; k < 25000; ++k)
{
sum = 0;
#endif
switch (i)
{
case 0:
sum += func(0);
for (j = 1; j <= 7999; ++j)
sum += TERN_STMT(j);
break;
case 1:
for (j = 8000; j <= 15999; ++j)
sum += TERN_STMT(j);
break;
case 2:
for (j = 16000; j <= 23999; ++j)
sum += TERN_STMT(j);
break;
case 3:
for (j = 24000; j < 32000; ++j)
sum += TERN_STMT(j);
sum += func(1);
break;
}
#if SPEED_TEST
}
#endif
// Write the data to the pipe
write(mypipe[1], &sum, sizeof(sum));
exit(0);
}
}
int status;
pid_t pid;
while (n > 0)
{
// Wait for the calculations to finish
pid = wait(&status);
// Read from the pipe
read(mypipe[0], &temp, sizeof(total_sum));
// Add to the total
total_sum += temp;
n--;
}
// Final calculations
total_sum /= 3.0;
total_sum *= (1.0/32000.0);
// Print our result
printf("%f\n", total_sum);
return EXIT_SUCCESS;
}
Here is a time
result from the pthreads version running 100,000 times:
real 11.15
user 11.15
sys 0.00
And here is the child process version:
real 5.99
user 23.81
sys 0.00
Having a user time of 23.81 implies that that is the sum of the time each core took to execute the code. In the pthread analysis the real/user time is the same implying that only one core is being used. Why isn't it using all 4 cores? I thought by default it might do it better than child processes.
Hopefully this question makes sense, this is my first time programming with pthreads and I'm pretty new to OS-level programming in general.
Thanks for taking the time to read this lengthy question.
المحلول