Can I iterate over a C++11 std::tuple with openmp?

Question

The C++11 template syntax is highly alien to me, but recursive problems such as this one are best made parallel using explicit OpenMP tasks:

template<std::size_t I = 0, typename FuncT, typename... Tp>
inline typename std::enable_if<I < sizeof...(Tp), void>::type
for_each(std::tuple<Tp...>& t, FuncT& f)
{
    #pragma omp task firstprivate(I) shared(t,f)
    {
        f(std::get<I>(t));
    }
    for_each<I + 1, FuncT, Tp...>(t, f);
}

...

// Proper usage
#pragma omp parallel
{
    #pragma omp single
    for_each(...);
}

The important part is to have the top level call to for_each in a single construct inside a parallel region. Thus only a single thread will call for_each, which in turn will result in f(std::get<I>(t)); being queued for later execution as an explicit task. The other threads, while waiting at the implicit barrier at the end of the single construct, will start pulling tasks from the task queue and execute them in parallel until the queue is empty. The sharing classes of all variables used by the task are given explicitly for clarity.

The objects that t and f reference should be shared and the references themselves (basically the pointers that implement the references) should be firstprivate. On the other side, the OpenMP standard prohibits reference types from being firstprivate and different compiler vendors tend to implement the standard differently. Intel C++ Compiler accepts the following code and it gives the correct results inside the task but the referenced variable is privatised (which is wrong):

void f(int& p)
{
   #pragma omp task
   {
      cout << "p = " << p << endl;
      p = 3;
      cout << "p' = " << p << endl;
   }
}

void f1()
{
   int i = 5;

   #pragma omp parallel
   {
      #pragma omp single
      f(i);
   }
   cout << "i = " << i << endl;
}

PGI's compiler gives the correct result and does not privatise i. On the other side GCC correctly determines that p should be firstprivate but then runs into the prohibition in the standard and gives a compile-time error.

If one modifies the task to read:

#pragma omp task shared(p)
{
    ...
}

it works correctly with GCC but the task prints wrong initial value of p and then causes a segmentation fault with both Intel C++ Compiler and PGI's C++ compiler.

Go figure!