Question

is there any easy way how to pass float4 or any other vector argument to OpenCL kernel? For scalar argument (int, float) you can pass it directly while calling kernel. For array argument you have to first copy it to GPU using cl.Buffer() and than pass pointer. Sure it is probably possible to pass float4 the same way as array. But I ask if there is any easier and more clear way. ( especially using Python, numpy, pyOpenCL)

I tried pass numpy array of size 4*float32 as float4 but it does not work. Is it possible to do it somehow else?

For example : kernnel:

__kernel void myKernel( __global float  * myArray, float myFloat, float4 myFloat4 )

Python:

myFloat4   = numpy.array  ( [1.0 ,2.0 ,3.0], dtype=np.float32 ) 
myArray    = cl.Buffer(ctx, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=myArray_host)
kernelargs = ( myArray , numpy.float32(myFloat) , myFloat4) 
prg.myKernel(queue, cl_myArray.shape() , None, *(kernelargs) )

I got error :

pyopencl.LogicError: when processing argument #2 (1-based): clSetKernelArg failed: invalid arg size

the other possibiliy is passing it as set of scalar int or float - like:

__kernel void myKernel( __global float  * myArray, float myFloat, float myFloat4_x, float myFloat4_y, float myFloat4_z  )

kernelargs = ( myArray , numpy.float32(myFloat) ,numpy.float32(myFloat4_x),numpy.float32(myFloat4_y),numpy.float32(myFloat4_z))

but this is also not very convenient - you can be easily lost in many variable names if you want for example pass 4x float4 and 5x int3 to the kernell.

I think passing vectors (2,3,4) of int and float must be quite common in OpenCL - for example the size of 3D data grids. So I wonder if it is really necessary to pass it using cl.Buffer() as pointers.

I guess that constant argument float4 is also faster than *float (because it can be shared as a constant by all workitems)

Was it helpful?

Solution

I find this a nice way to create a float4 in python:

import numpy as np
import pyopencl as cl
import pyopencl.array as cl_array

data= np.zeros(N, dtype=cl_array.vec.float4)

Edit: To also give a MWE:

import numpy as np
import pyopencl as cl
import pyopencl.array as cl_array


deviceID = 0
platformID = 0
workGroup=(1,1)

N = 10
testData = np.zeros(N, dtype=cl_array.vec.float4)

dev = cl.get_platforms()[platformID].get_devices()[deviceID]

ctx = cl.Context([dev])
queue = cl.CommandQueue(ctx)
mf = cl.mem_flags
Data_In = cl.Buffer(ctx, mf.READ_WRITE, testData.nbytes)


prg = cl.Program(ctx, """

__kernel void   Pack_Cmplx( __global float4* Data_In, int  N)
{
  int gid = get_global_id(0);

  Data_In[gid] = 1;
}
 """).build()

prg.Pack_Cmplx(queue, (N,1), workGroup, Data_In, np.int32(N))
cl.enqueue_copy(queue, testData, Data_In)


print testData

OTHER TIPS

Problem is here:

myFloat4   = numpy.array  ( [1.0 ,2.0 ,3.0], dtype=numpy.float32 )

but myFloat4.size is equal to 3

Just type this :

myFloat4   = numpy.array  ( [1.0 ,2.0 ,3.0, 4.0], dtype=numpy.float32 )

The rest of code is be fine

I noticed three things:

  1. Looking at the error message, there seems to be an issue with the 2nd kernel argument, i.e. myFloat. What happens if you declare it a const argument in the kernel signature? What happens if you do

    myFloat = myFloat.astype(np.float32)
    kernelArgs = (..., myFloat, ...)
    prg.myKernel(...)
    
  2. You want to define a four-element vector myFloat4 but you give three values [1.0, 2.0, 3.0] only. Also try setting const float4 myFloat4 in the kernel signature.

  3. You don't need additional parentheses for the kernelargs tuple in the actual kernel call:

    prg.myKernel(queue, cl_myArray.shape() , None, *kernelargs)
    

For me, creating a numpy array of shape (SIZE,4) and dtype float32 worked fine when I ran opencl kernel. Be sure second dimension matches what kind of floatN you want, it won't throw any errors if they don't match but in my case it crashed graphics card driver.

The way I inited my arrays: np.zeros((SIZE,4), dtype=np.float32)

Hope this helps anybody who is wondering the same.

I don't know about OpenCl in Python, but I do pass double, int, double8, or whatever OpenCl type to kernels.
Suppose that N is an integer, alpha a double, and vect a double8.
What I do is

clSetKernelArg(kernel, 0, sizeof(int),  &N);
clSetKernelArg(kernel, 18, sizeof(double), &alpha);
clSetKernelArg(kernel, 11, sizeof(cl_double8), &vect);

Hope it helps. Éric.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top