Do this, if your data is ordered row-wise:
__kernel void padding(float* newMa, float* oldMa, int oldR, int oldC, int N)
{
int id = get_global_id(0);
int r = id/N;
int c = id%N;
float value = 0.0f;
if(r < oldR || c < oldC) //Inside the old matrix size
value = oldMa[r*oldR+oldC];
newMa[id] = value ;
}
The new matrix size should hold enough space for the operation, that is "NxN".
I don't know if you are using this memory ordering. Could you provide how you expect the data to interface with your other kernels? As other answer says, you provably don't need another kernel for such an easy operation. You can also integrate this inside your other kernel.