Your implementation looks very elegant.
According to the Open Group Specification of pthread_key_create, you don't have to set the reference to NULL in the destructor:
An optional destructor function may be associated with each key value. At thread exit, if a key value has a non-NULL destructor pointer, and the thread has a non-NULL value associated with that key, the value of the key is set to NULL, and then the function pointed to is called with the previously associated value as its sole argument.
I think this also implies that the key object itself will be autodestroyed by pthread. You only have to take care of what's stored behind the key, which is precisely what your delete ((T*)obj);
does.