Question

I'm trying to implement a basic oriented object program in CUDA.

I have the following class:

/*
 * File: Point.h
 */

class Point {

    private:
        float x;
        float y;
        float z;

    public:
        __host__ __device__ Point(float x, float y, float z);

        __host__ __device__ float getX();
        __device__ float getY();
        __device__ float getZ();

        __device__ void setX(float x);
        __device__ void setY(float y);
        __device__ void setZ(float z);

        __device__ Point* operator+(Point* p) {
            return new Point(
                p->getX(),
                p->getY(),
                p->getZ()
            );
        };
};

/*
 * File: Point.cu
 */

#include "Point.h"


__host__ __device__ Point::Point(float x, float y, float z) {
    this->x = x;
    this->y = y;
    this->z = z;
};

__host__ __device__ float Point::getX() {return x;}
__device__ float Point::getY() {return y;}
__device__ float Point::getZ() {return z;}

__device__ void Point::setX(float x) {this->x = x;}
__device__ void Point::setY(float y) {this->y = y;}
__device__ void Point::setZ(float z) {this->z = z;}

In the main.cu file I wrote a kernel to add 10 units to the 'x' component on each Point in a collection. This is the kernel:

__global__ void SumPoints(Point** d_arr_points) {
    int i = threadIdx.x + blockIdx.x * blockDim.x;
    d_arr_points[i]->setX(d_arr_points[i]->getX() + 10);
}

To achieve this goal I'm using CUDA thrust, buy still I don't understand how the raw pointers must be used in order to transfer data. Actually I'm trying this in the main function:

int main(void) {

    thrust::host_vector<Point *> h_points(NUM_THREADS);
    for (int i = 0; i < NUM_THREADS; i++) {
        Point * new_point = new Point(1,0,0);
        h_points.push_back(new_point);
    }

    thrust::device_vector<Point *> d_points = h_points;
    Point ** d_arr_points = thrust::raw_pointer_cast(&d_points[0]);

    SumPoints<<<NUM_BLOCKS, NUM_THREADS_PER_BLOCK>>>(d_arr_points);

    return 0;
}

And nvcc throws the following error:

./main.cu(22): Error: External calls are not supported (found non-inlined call to _ZN5Point4setXEf) make: *** [main.o] Error 2

Can anyone help me? Thanks!

Was it helpful?

Solution

The points are allocated in the host heap (invisible from the device) while only pointers are allocated in the device. Cuda cannot inline the method Point::setX if the object is stored in an invisible host memory.

Use thrust::host_vector<Point> as suggested in presius litel snoflek comment. With thrust::host_vector there's no need to explicitly use operator new. For instance, avoid code like __device__ Point* operator+(Point* p) that is trouble. Also, the class Point can safely be used with both host and device

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top