Frage

I have wrote an OpenCL kernel that is using the opencl-opengl interoperability to read vertices and indices, but probably this is not even important because I am just doing simple pointer addition in order to get a specific vertex by index.

uint pos = (index + base)*stride;

Here i am calculating the absolute position in bytes, in my example pos is 28,643,328 with a stride of 28, index = 0 and base = 1,022,976. Well, that seems correct.

Unfortunately, I cant use vload3 directly because the offset parameter isn't calculated as an absolute address in bytes. So I just add pos to the pointer void* vertices_gl

void* new_addr = vertices_gl+pos;

new_addr is in my example = 0x2f90000 and this is where the strange part begins,

vertices_gl = 0x303f000


The result (new_addr) should be 0x4B90000 (0x303f000 + 28,643,328)

I dont understand why the address vertices_gl is getting decreased by 716,800 (0xAF000)


I'm targeting the GPU: AMD Radeon HD5830

Ps: for those wondering, I am using a printf to get these values :) ( couldn't get CodeXL working)

War es hilfreich?

Lösung

There is no pointer arithmetic for void* pointers. Use char* pointers to perform byte-wise pointer computations.

Or a lot better than that: Use the real type the pointer is pointing to, and don't multiply offsets. Simply write vertex[index+base] assuming vertex points to your type containing 28 bytes of data.

Performance consideration: Align your vertex attributes to a power of two for coalesced memory access. This means, add 4 bytes of padding after each vertex entry. To automatically do this, use float8 as the vertex type if your attributes are all floating point values. I assume you work with position and normal data or something similar, so it might be a good idea to write a custom struct which encapsulates both vectors in a convenient and self-explaining way:

// Defining a type for the vertex data. This is 32 bytes large.
// You can share this code in a header for inclusion in both OpenCL and C / C++!
typedef struct {
    float4 pos;
    float4 normal;
} VertexData;

// Example kernel
__kernel void computeNormalKernel(__global VertexData *vertex, uint base) {
    uint index = get_global_id(0);
    VertexData thisVertex = vertex[index+base];   // It can't be simpler!
    thisVertex.normal = computeNormal(...);       // Like you'd do it in C / C++!
    vertex[index+base] = thisVertex;              // Of couse also when writing
}

Note: This code doesn't work with your stride of 28 if you just change one of the float4s to a float3, since float3 also consumes 4 floats of memory. But you can write it like this, which will not add padding (but note that this will penalize memory access bandwidth):

typedef struct {
    float pos[4];
    float normal[3];  // Assuming you want 3 floats here
} VertexData;
Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top