There are two approaches I can think of for solving this sort of problem:
- Use the thrust zip operator to combine a counting iterator with the input data and modify your existing functor to accept tuples of (index, data). You can have the functor return the data when the index matches your criteria, and zero otherwise. This will work correctly with scan and reduction algorithms
- Use a thrust permutation iterator to gather the data which you want to sum and pass it to the standard reduce algorithm. The thrust developers have an example strided iterator which you can use to solve the problem of only processing every nth entry in an input iterator.
It might be worth implemented both and benchmarking them to see which approach is faster.