Cuda single-thread scoped variables

Question 1

Your usage case is a truly awful idea, and I wouldn't recommend that design pattern to my worst enemy. Leaving aside the merits of the code for a moment, as I hinted in comments, you can achieve the thread local variable scoping you desire by encapsulating the __device__ functions and variables they rely on in a structure, like this:

struct folly
{
    int id;
    int variable1;
    int variable2;
    int variable3;
    int variable4;

    __device__ folly(int _id) {
        id = _id;
        variable1 = 3;
        variable2 = 5;
        variable3 = 8;
        variable4 = 8;
    }

    __device__ int deviceFunction3() {
        variable1 += 8;
        variable4 += 7;
        variable2 += 1;
        variable3 += id;

        return variable1 + variable2 + variable3;
    }

    __device__ int deviceFunction2() {
        variable3 += 8; 
        variable1 += deviceFunction3();
        variable4 += deviceFunction3();

        return variable3 + variable4;
    }

    __device__ int deviceFunction1() {
        variable1 += id;
        variable4 += 2;
        variable2 += deviceFunction2();
        variable3 += variable2 + variable4;
        return variable1 + variable2 + variable3 + variable4;
    }
};

__global__ void kernel(int *dev_a, int *dev_b, int *dev_c) {
    int id = threadIdx.x + blockIdx.x * blockDim.x;
    folly do_calc(id);
    dev_c[id] = do_calc.deviceFunction1();
}

Also note that CUDA supports C++ style pass by reference, so any one of the device functions you have written in the second piece of code you posted could easily be written like this:

__device__ int deviceFunction3(int & variable1, int & variable2, 
                               int & variable3, int & variable4) 
{
  variable1 += 8;
  variable4 += 7;
  variable2 += 1;
  variable3 += 2;

  return variable1 + variable2 + variable3;
}

which is far cleaner and easier to read.

Question 2

I just wanted to add that I have concluded that this is not possible. I find it to be a major design problem with CUDA C.

I have seen a keyword called __local__ in some slideshows, but I cannot find any documentation, and it is not recognised by nvcc either.

I guess that all variables that are supposed to only have the scope of a single thread must be declared inside functions only.