Are C++ exceptions sufficient to implement thread-local storage?

https://stackoverflow.com/questions/2487509

21-09-2019
|

Question

I was commenting on an answer that thread-local storage is nice and recalled another informative discussion about exceptions where I supposed

The only special thing about the execution environment within the throw block is that the exception object is referenced by rethrow.

Putting two and two together, wouldn't executing an entire thread inside a function-catch-block of its main function imbue it with thread-local storage?

It seems to work fine, albeit slowly. Is this novel or well-characterized? Is there another way of solving the problem? Was my initial premise correct? What kind of overhead does get_thread incur on your platform? What's the potential for optimization?

#include <iostream>
#include <pthread.h>
using namespace std;

struct thlocal {
    string name;
    thlocal( string const &n ) : name(n) {}
};

struct thread_exception_base {
    thlocal &th;
    thread_exception_base( thlocal &in_th ) : th( in_th ) {}
    thread_exception_base( thread_exception_base const &in ) : th( in.th ) {}
};

thlocal &get_thread() throw() {
    try {
        throw;
    } catch( thread_exception_base &local ) {
        return local.th;
    }
}

void print_thread() {
    cerr << get_thread().name << endl;
}

void *kid( void *local_v ) try {
    thlocal &local = * static_cast< thlocal * >( local_v );
    throw thread_exception_base( local );
} catch( thread_exception_base & ) {
    print_thread();

    return NULL;
}

int main() {
    thlocal local( "main" );
    try {
        throw thread_exception_base( local );
    } catch( thread_exception_base & ) {
        print_thread();

        pthread_t th;
        thlocal kid_local( "kid" );
        pthread_create( &th, NULL, &kid, &kid_local );
        pthread_join( th, NULL );

        print_thread();
    }

    return 0;
}

This does require defining new exception classes derived from thread_exception_base, initializing the base with get_thread(), but altogether this doesn't feel like an unproductive insomnia-ridden Sunday morning…

EDIT: Looks like GCC makes three calls to pthread_getspecific in get_thread. EDIT: and a lot of nasty introspection into the stack, environment, and executable format to find the catch block I missed on the first walkthrough. This looks highly platform-dependent, as GCC is calling some libunwind from the OS. Overhead on the order of 4000 cycles. I suppose it also has to traverse the class hierarchy but that can be kept under control.

Solution

In the playful spirit of the question, I offer this horrifying nightmare creation:

class tls
{
    void push(void *ptr)
    {
        // allocate a string to store the hex ptr 
        // and the hex of its own address
        char *str = new char[100];
        sprintf(str, " |%x|%x", ptr, str);
        strtok(str, "|");
    }

    template <class Ptr>
    Ptr *next()
    {
        // retrieve the next pointer token
        return reinterpret_cast<Ptr *>(strtoul(strtok(0, "|"), 0, 16));
    }

    void *pop()
    {
        // retrieve (and forget) a previously stored pointer
        void *ptr = next<void>();
        delete[] next<char>();
        return ptr;
    }

    // private constructor/destructor
    tls() { push(0); }
    ~tls() { pop(); }

public:
    static tls &singleton()
    {
        static tls i;
        return i;
    }

    void *set(void *ptr)
    {
        void *old = pop();
        push(ptr);
        return old;
    }

    void *get()
    {
        // forget and restore on each access
        void *ptr = pop();
        push(ptr);
        return ptr;
    }
};

Taking advantage of the fact that according to the C++ standard, strtok stashes its first argument so that subsequent calls can pass 0 to retrieve further tokens from the same string, so therefore in a thread-aware implementation it must be using TLS.

example *e = new example;

tls::singleton().set(e);

example *e2 = reinterpret_cast<example *>(tls::singleton().get());

So as long as strtok is not used in the intended way anywhere else in the program, we have another spare TLS slot.

OTHER TIPS

I think you're onto something here. This might even be a portable way to get data into callbacks that don't accept a user "state" variable, as you've mentioned, even apart from any explicit use of threads.

So it sounds like you've answered the question in your subject: YES.

void *kid( void *local_v ) try {
    thlocal &local = * static_cast< thlocal * >( local_v );
    throw local;
} catch( thlocal & ) {
    print_thread();

    return NULL;
}

void *kid (void *local_v ) { print_thread(local_v); }

I might be missing something here, but it's not a thread local storage, just unnecessarily complicated argument passing. Argument is different for each thread only because it is passed to pthread_create, not because of any exception juggling.

It turned out that I indeed was missing that GCC is producing actual thread local storage calls in this example. It actually makes the issue interesting. I'm still not quite sure whether it is a case for other compilers, and how is it different from calling thread storage directly.

I still stand by my general argument that the same data can be accessed in a more simple and straight-forward way, be it arguments, stack walking or thread local storage.

Accessing data on the current function call stack is always thread safe. That's why your code is thread safe, not because of the clever use of exceptions. Thread local storage allows us to store per-thread data and reference it outside of the immediate call stack.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow