문제

I'm trying to figure out how I can use OpenMP's for reduction() equivalent in CUDA. I've done some research online, and none of what I've tried worked. The code:

    #pragma omp parallel for reduction(+:sum)
    for (i = 0; i < N; i++)
    {
        float f = ...  //store return from function to f
        out[i] = f;    //store f to out[i]
        sum += f;      //add f to sum and store in sum
    }

I know what for reduction() does in OpenMP....it makes the last line of the for loop possible. But how can I use CUDA to express the same thing?

Thanks!

도움이 되었습니까?

해결책

Use Thrust, An STL inspired library that comes with CUDA. See the Quick Start Guide for examples on how to perform reductions.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top