As already pointed out by @Marco13 the kernel suffers from quite a few issues.
When running this kernel through a tool like clcc you can see that there are a number of compilation errors to begin with:
> clcc matmul.cl
"/tmp/OCLu7FyFF.cl", line 1: error: identifier "_global" is undefined
__kernel void multiply(_global int outputC, _global int inputA,
^
"/tmp/OCLu7FyFF.cl", line 1: error: invalid combination of type specifiers
__kernel void multiply(_global int outputC, _global int inputA,
^
"/tmp/OCLu7FyFF.cl", line 1: error: identifier "_global" is undefined
__kernel void multiply(_global int outputC, _global int inputA,
^
"/tmp/OCLu7FyFF.cl", line 1: error: invalid combination of type specifiers
__kernel void multiply(_global int outputC, _global int inputA,
^
"/tmp/OCLu7FyFF.cl", line 2: error: identifier "_global" is undefined
_global int inputB)
^
"/tmp/OCLu7FyFF.cl", line 2: error: invalid combination of type specifiers
_global int inputB)
^
6 errors detected in the compilation of "/tmp/OCLu7FyFF.cl".
A tool like clcc
is very useful for catching errors early on. Most vendors also have their own version of a standalone kernel compiler/checker: e.g. Intel has its Kernel Builder, AMD's CodeXL contains a static kernel analyzer. Another option is to retrieve kernel compilation errors right from your host code, by calling clGetProgramBuildInfo
to retrieve the compiler output, after clBuildProgram
returned CL_BUILD_PROGRAM_FAILURE
.
Once these compilation errors are fixed, it looks like your kernel is still not doing what you expect: as noted, the inputs and outputs should be pointers, as you will be passing buffers to the kernel. Also, the indexing of your input and output arrays is incorrect: In the for-loop inputA[row * 3 + 1]
should be inputA[row * 3 + i]
(i
instead of 1
). When saving the result to outputC
, I would expect outputC[row * 3 + col]
(row * 3
) instead of row + 3
).
I haven't looked in detail at the host code, but I would at least make sure, especially when just starting out with OpenCL, to always check every return code and error. This will save you a lot of time and frustration.
Finally, if you want a quick jump-start to learning OpenCL with a hands-on approach, I would strongly recommend going through the open source Hands-on OpenCL training by Simon McIntosh-Smith and Tom Deakin. It doesn't take very long, is quite pragmatic and provides lots of useful insights. Optimizing matrix multiplication is one of the use cases that is shown step-by-step.