Since you are restricted by whatever is possible with OpenMP 2.0, your way to go is writing a custom reduction. The basic scheme is:
- Before the parallel region, create an array for partial results. It should have at least as many elements as there are threads in the parallel region; you might use
omp_get_max_threads()
function which is the upper bound. Initialize it with zeros (the identity element for summation). - Inside the parallel region, use
omp_get_thread_num()
function to obtain the number of the current thread within the parallel region, and use it as the index to the above array. Accumulate the result in the corresponding array element. - After the region, use a serial loop to reduce the partial results accumulated in the array.