There are a few issues here.
First your code is not data parallel. You have a 'race' condition on 'res' so this code cannot be computed on the GPU.
Secondly the range of execution is way too small. You are trying to execute 6 threads (x [2,3,4] * y [ 3,4]). This will not really gain any benefit from the GPU.
To answer the question regarding how you might implement over the 2 dim grid above.
Range range = Range.create2D(3, 2) ; // A two dimension grid 3x2
Kernel kernel = new Kernel() {
@Override
public void run() {
int x = getGlobalId(0)+2; // x starts at 2
int y = getGlobalId(1)+3; // y starts at 3
...
}
};
kernel.execute(range);
kernel.dispose();