If you only need to count how many iterations were executed to the moment, a simple solution could be to use a global atomic counter:
#include <tbb/tbb.h>
tbb::atomic<size_t> atomic_progress_counter;
void ParallelFoo() {
tbb::parallel_for( tbb::blocked_range<size_t>(0, 1000),
[&]( tbb::blocked_range<size_t> r ) {
for( size_t i=r.begin(); i!=r.end(); ++i ) {
Foo(i);
++atomic_progress_counter;
}
}
);
}
However if the amount of work per iteration is small and HW concurrency is big, atomic increments of a shared variable can add noticeable overhead. For example, I would be careful with this method on Intel's Xeon Phi coprocessors.