Question

Out of curiosity to find maximum parameter size that I can pass when invoking the OpenCL kernel, I found that I can pass an array with size greater than the maximum size. Here's what is happening: ( by the by, I am using pyopencl )

>>> plat = cl.get_platforms()
>>> dev = plat[0].get_devices( cl.device_type.ALL )
>>> dev[0]
<pyopencl.Device 'Juniper' on 'AMD Accelerated Parallel Processing' at 0x58fde60>
>>> dev[0].max_parameter_size
1024

On google search, I learned that 1024 is in Bytes. ( I forgot where it was stated, I think an Nvidia forum. )

Now, I ran this script:

import pyopencl as cl
import numpy as np

plat = cl.get_platforms()
dev = plat[0].get_devices( cl.device_type.ALL )
ctx = ctx = cl.Context( [ dev[0] ] )
cq = cl.CommandQueue( ctx )

kernel = """
__kernel void test( __global int* A, __global int* B ){
const int id = get_global_id( 0 );
B[ id ] = A[ id ];
barrier( CLK_GLOBAL_MEM_FENCE );
}
"""

prg = cl.Program( ctx, kernel ).build()

A = np.ones( ( 2**18, ), dtype = np.int32 )
B = np.zeros_like( A )

A_buf = cl.Buffer( ctx, cl.mem_flags.READ_ONLY|cl.mem_flags.COPY_HOST_PTR, hostbuf = A )   
B_buf = cl.Buffer( ctx, cl.mem_flags.WRITE_ONLY, B.nbytes )

before calling the kernel, I did the following:

>>> A.nonzero()[0].shape
(262144,)
>>> B.nonzero()[0].shape
(0,)

Then I called the kernel and checked for non-zero elements in B:

>>> prg.test( cq, A.shape, A_buf, B_buf ).wait()
>>> cl.enqueue_copy( cq, B, B_buf )
>>> B.nonzero()[0].shape
(262144,)

So, clearly, I can send and read back array of size greater than cl.max_parameter_size. How is this possible? or Where am I going wrong?

Was it helpful?

Solution

CL_DEVICE_MAX_PARAMETER_SIZE refers to the max size of a kernel parameter passed to clSetKernelArg. See CL_DEVICE_MAX_MEM_ALLOC_SIZE and CL_DEVICE_GLOBAL_MEM_SIZE in clGetDeviceInfo.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top