Question

The following code fails to join pthreads and the message "join failed" is printed. How do I get more information about the failure and its cause?

pthread_t aThread[MAX_LENGTH];
    int errCode[MAX_LENGTH];
    char returnVal;    
for(int i = 0; i < MAX_LENGTH; i++)
    {

        if((errCode[i] = pthread_create(&aThread[i], NULL, &findMatch, &fpArgs)) != 0)
            printf("error creating thread %d\n", errCode[i]);
        if(!pthread_join(aThread[i], (void**)&returnVal))
            printf("join failed\n i is %d", i);
    }

EDIT: actually join returned no error and I made a mistake. The if statment shouldn't have the ! because join returns a non-zero number if there is a problem which evaluates to true.

Was it helpful?

Solution

I pointed this out in comment, but it deserves amplification.

Your returnVal usage is wrong

The pthread_join api expects a void**, that is a pointer to a void*. Unlike void*, a void** is not equally universal. It is a pointer of specific type and as such you should only pass a likewise typed address. However, you're not using it anyway, so I would suggest for now you simply pass NULL. As-written, it is undefined behavior. And I can all-but-guarantee you sizeof(char), the writable size of the address you giving it, and sizeof(void*), the size it expects to have available, are not the same. Consider this instead for now:

pthread_join(aThread[i], NULL);

In case you're wondering what the use for that void** parameter is, it is a place to store void* return value from your thread-proc. Recall a pthread thread-proc looks like this:

void* thread_proc(void* args)
// ^----- this is what is stashed in the pthread_join second parameter

You're logic for failure testing is backwards

The pthread_join function returns 0 on success; not on failure.


You're not actually running concurrent threads

Thread concurrency simply means your threads run simultaneously. But yours do not. You start a thread, then wait for it to end, then start a thread, then wait for it to end, etc. This is literally no better (and in fact, actually worse) than simply calling a function. If you want your threads to run concurrently your logic should be styled like this:

pthread_t aThread[MAX_LENGTH];
int errCode[MAX_LENGTH] = {0};

for (int i = 0; i < MAX_LENGTH; i++)
{
    if((errCode[i] = pthread_create(&aThread[i], NULL, &findMatch, &fpArgs)) != 0)
        printf("error creating thread %d, error=%d\n", i, errCode[i]);
}

for (int i = 0; i < MAX_LENGTH; i++)
{
    // note the check for errCode[i], which is only non-zero 
    //  if the i'th thread failed to start
    if(errCode[i] == 0)
    {
        errCode[i] = pthread_join(aThread[i], NULL))
        if (errCode[i] != 0)
            printf("error joining thread %d, error=%d\n", i, errCode[i]);
    }
}

OTHER TIPS

When the function fails (i.e. in any pthread call, a return code that is not equal to zero) it will set errno to the value of the reason for failure. There are a couple of ways to get the textual explanation of failure code.

int returnval;

if((returnval = pthread_join(aThread[i], (void**)&returnVal)) != 0)
{
    printf("error joining thread: %s\n", strerror(returnval));  //1st optiop

    perror("error joining thread:");  //2nd option

    printf("error joining thread: %m\n");  //3rd option

}

(1) strerror will print the error string of the error value you pass it and is convenient for placing in printf statements.

(2) perror allows you to pass a little string that will print first and then it will automatically print the error description of whatever value errno is set to. You don't need to explicitly pass errno.

(3) There is a glibc extension to printf that provide a %m conversion specifier that acts like strerror but with a little less muss and fuss. This would be the least portable.

Once you get the description you can easily look into the man pages of the call that failed and they will provide greater hints as to why the call failed. Charlie Burns has posted the reasons pthread_join might fail.

Am I missing something? The return value tells you the error:

RETURN VALUES If successful, the pthread_join() function will return zero. Otherwise, an error number will be returned to indicate the error.

ERRORS pthread_join() will fail if:

 [EDEADLK]          A deadlock was detected or the value of thread speci-
                    fies the calling thread.

 [EINVAL]           The implementation has detected that the value speci-
                    fied by thread does not refer to a joinable thread.

 [ESRCH]            No thread could be found corresponding to that speci-
                    fied by the given thread ID, thread.

More specifically::

int retVal = pthread_create(&myThread, NULL,myThreadFn, NULL);
printf("error joining thread: %d\n", retVal);

The pthread library does not set the errno variable upon error. The error code is returned by the function instead. The online manual under Linux is quite clear for the pthread functions (e.g. man pthread_join) as the "RETURN VALUE" section generally contains something like:

RETURN VALUE

On success, pthread_join() returns 0; on error, it returns an error number.

If you need to output the error through functions like strerror(), strerror_r() or %m printf format (the latter is a GLIBC extension), you must use the return code of the failing service or update errno in the error branch:

if ((rc = pthread_join(...)) != 0) {
  errno = rc;
  fprintf(stderr, "pthread_join(): %m\n");
 OR
  fprintf(stderr, "pthread_join(): %m\n", strerror(errno)); // rc could be used without errno
 OR
  char err_buf[128];
  errno = rc;
  fprintf(stderr, "pthread_join(): %m\n", strerror_r(errno, err_buf, sizeof(err_buf))); // rc could be used without errno

Notes:

  • errno is thread-safe (it is located in the thread local storage). So, it is local to each thread
  • strerror_r() and %m should be used in multi-threaded environment as they are thread-safe (strerror() is not)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top