Question

I'm trying to implement a basic oriented object program in CUDA.

I have the following class:

/*
 * File: Point.h
 */

class Point {

    private:
        float x;
        float y;
        float z;

    public:
        __host__ __device__ Point(float x, float y, float z);

        __host__ __device__ float getX();
        __device__ float getY();
        __device__ float getZ();

        __device__ void setX(float x);
        __device__ void setY(float y);
        __device__ void setZ(float z);

        __device__ Point* operator+(Point* p) {
            return new Point(
                p->getX(),
                p->getY(),
                p->getZ()
            );
        };
};

/*
 * File: Point.cu
 */

#include "Point.h"


__host__ __device__ Point::Point(float x, float y, float z) {
    this->x = x;
    this->y = y;
    this->z = z;
};

__host__ __device__ float Point::getX() {return x;}
__device__ float Point::getY() {return y;}
__device__ float Point::getZ() {return z;}

__device__ void Point::setX(float x) {this->x = x;}
__device__ void Point::setY(float y) {this->y = y;}
__device__ void Point::setZ(float z) {this->z = z;}

In the main.cu file I wrote a kernel to add 10 units to the 'x' component on each Point in a collection. This is the kernel:

__global__ void SumPoints(Point** d_arr_points) {
    int i = threadIdx.x + blockIdx.x * blockDim.x;
    d_arr_points[i]->setX(d_arr_points[i]->getX() + 10);
}

To achieve this goal I'm using CUDA thrust, buy still I don't understand how the raw pointers must be used in order to transfer data. Actually I'm trying this in the main function:

int main(void) {

    thrust::host_vector<Point *> h_points(NUM_THREADS);
    for (int i = 0; i < NUM_THREADS; i++) {
        Point * new_point = new Point(1,0,0);
        h_points.push_back(new_point);
    }

    thrust::device_vector<Point *> d_points = h_points;
    Point ** d_arr_points = thrust::raw_pointer_cast(&d_points[0]);

    SumPoints<<<NUM_BLOCKS, NUM_THREADS_PER_BLOCK>>>(d_arr_points);

    return 0;
}

And nvcc throws the following error:

./main.cu(22): Error: External calls are not supported (found non-inlined call to _ZN5Point4setXEf) make: *** [main.o] Error 2

Can anyone help me? Thanks!

Était-ce utile?

La solution

The points are allocated in the host heap (invisible from the device) while only pointers are allocated in the device. Cuda cannot inline the method Point::setX if the object is stored in an invisible host memory.

Use thrust::host_vector<Point> as suggested in presius litel snoflek comment. With thrust::host_vector there's no need to explicitly use operator new. For instance, avoid code like __device__ Point* operator+(Point* p) that is trouble. Also, the class Point can safely be used with both host and device

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top