Вопрос

I'd like to use Thrust (as most of my method is implemented using thrust data types) or C CUDA if necessary to sum only the positive floating point elements of a vector. The data is not initially sorted. My initial stab was very bad: basically, copy off the vector, sort it, find the zero crossing by passing it to a kernel which compares sequential pair-wise values and writes those that match the zero crossing. Basically after sorting (which I do with Thrust)...

int i = blockIdx.x * blockDim.x + threadIdx.x;
if (i < n - 1) {
  float a = vector[i];
  float b = vector[i + 1];
  if (a >= 0.0 && b < 0.0)
    answer = i + 1;
}

This is really dumb witted, lots of threads match the conditional, way too many reads, branch divergences, etc. So, it totally fails, each call will give different results on the same data, etc.

I have yet to find a good way to implement this in Thrust, which is what I would prefer. After sorting I don't know how to find the zero crossing. Any advice on a jumping off point here? An actually working simple CUDA C implementation would be just fine too.

Это было полезно?

Решение

To sum only positive values, you do not need to sort your initial values, use thrust::transform_reduce:

template<typename T>
struct positive_value : public thrust::unary_function<T,T>
{
   __host__ __device__ T operator()(const T &x) const
   {
     return x < T(0) ? 0  : x;
   }
};

float result = thrust::transform_reduce(data.begin(), data.end(),
                                    positive_value<float>(),
                                    0,
                                    thrust::plus<float>());
Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top