Error handling in a multi-threaded application

https://stackoverflow.com/questions/22700123

22-06-2023
|

Question

Say a thread function looks like:

void *threadFunc(void *args)
{
    if(args == NULL)
    {
        /*
         * Let's assume that this case is a fatal error which
         * cannot be recovered from.
         */

        fprintf(stderr, "Yikes... demons avoided.\n");
        exit(EXIT_FAILURE);
    }

    // Code to do stuff

    return NULL;  // Return value does not matter
}

Note: My example here is just an analogy I crafted to closely resemble the real problem I'm facing.

_{PS: Don't worry, my error messages are more descriptive in reality.}

The scenario I'm seeing is that this fatal error is detected by more than 1 thread sometimes. I've found that sometimes, when a thread detects this error and reaches the fprintf, it gets preempted by another thread which also detects the same error and also gets preempted when it reaches its fprintf and so on.

I'm just wondering how I can handle this particular case such that when one thread detects this fatal error, it shuts down the application immediately in a way such that other threads don't interfere with it while it is trying to shut the application down.

I'm thinking to surround the error detection in a mutex as follows:

void *threadFunc(void *args)
{
    lockMutex(mutex);
    if(args == NULL)
    {
        /*
         * Let's assume that this case is a fatal error which
         * cannot be recovered from.
         */

        fprintf(stderr, "Yikes... demons avoided.\n");
        exit(EXIT_FAILURE);
    }
    unlockMutex(mutex);

    // Code to do stuff

    return NULL;  // Return value does not matter
}

This does not seem very elegant to me because if a failure is detected, then the application will exit leaving the mutex locked. I know the OS is supposed to free all resources, but this still does not seem very nice to me.

Could you suggest another, possibly better way I can do this? Is my design itself broken?

Solution

bool deadmanWalking = false;
mutex deathMutex = INIT....

void cleanup()
{
    lock(&deathMutex);

    if (deadmanWalking)
    {
        unlock(&deathMutex);
        return;
    }

    deadmanWalking = true;

    // cleanup code

    unlock(&deathMutex);
    exit(EXIT_FAILURE);
}

//.........

    if(args == NULL)
    {
        cleanup();
        pthread_exit(...);
    }

Arguably the best solution is to return with an error indication from the error detecting thread(s) to whatever started them and let it orderly cancel the remainder of threads and cleanup rather than just dropping the bottom out from under them by exiting.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow