What kind of computations are you after? If those are compute intensive such as N-body gravity experiments, then you can simply copy variables to gpu then compute then copy results back to main memory.
If your objects have big data but small computation such as fluid dynamics or collision detection, then you should add interoperability between your graphics api and compute api. Then you can do only computations withouth any copying of data.(speed-up is like your GPU ram bandwidth divided by your pci-e bandwidth. For a HD7870, it is like 25x if compute power is not saturated already)
I used jocl and lwjgl using gl/cl interoperability in java and they were working very well.
Some neural network is trained with CPU(Encog) but used by GPU(jocl) to generate a map and drawn by LWJGL :(neuron weigths are changed a little to have some more randomizing effect)
Very important part is:
- Start a GL context.
- Use the GL context's handle variables to start an inter-operable CL context
- Create GL buffers
- Create CL buffers with the interoperable cl context.
- Dont forget calling clFinish() when opencl is done and gl is ready to start
- Dont forget calling glFinish() when opengl is done and cl is ready to start
- Using an opencl kernel builder/table class and a buffer scheduler class would help when you have tens of different kernels many different buffers between gl and cl and you need them run in an order.
Example:
// clh is a fictional class that binds oepncl to opengl through interoperability
// registering needed kernels to this object
clh.addKernel(
kernelFactory.fluidDiffuse(1024,1024), // enumaration is fluid1
kernelFactory.fluidAdvect(1024,1024), // enumeration is fluid2
kernelFactory.rigidBodySphereSphereInteracitons(2048,32,32),
kernelFactory.fluidRigidBodyInteractions(false), // fluidRigid
kernelFactory.rayTracingShadowForFluid(true),
kernelFactory.rayTracingBulletTargetting(true),
kernelFactory.gravity(G),
kernelFactory.gravitySphereSphere(), // enumeration is fall
kernelFactory.NNBotTargetting(3,10,10,2,numBots) // Encog
);
clh.addBuffers(
// enumeration is buf1 and is used as fluid1, fluid2 kernels' arguments
bufferFactory.fluidSurfaceVerticesPosition(1024,1024, fluid1, fluid2),
// enumeration is buf2, used by fluid1 and fluid2
bufferFactory.fluidSurfaceColors(1024,1024,fluid1, fluid2),
// enumeration is buf3, used by network
bufferFactory.NNBotTargetting(numBots*25, Encog)
)
Running kernels:
// shortcut of a sequence of kernels
int [] fluidCalculations = new int[]{fluid1,fluid2,fluidRigid, fluid1}
clh.run(fluidCalculations); // runs the registered kernels
// diffuses, advects, sphere-fluid interaction, diffuse again
//When any update of GPU-buffer from main-memory is needed:
clh.sendData(cpuBuffer, buf1); // updates fluid surface position from main-memory.
Changing a cpu code to a opencl code can be done automatically by APARAPI but Im not sure if it has interoperability.
If you need to do it yourself, then it is as easy as:
From Java:
for(int i=0;i<numParticles;i++)
{
for(int j=0;j<numParticles;j++)
{
particle.get(i).calculateAndAddForce(particle.get(j));
}
}
To a Jocl kernel string(actually very similar to calculateAndAddForce's inside):
"__kernel void nBodyGravity(__global float * positions,__global float *forces)" +
"{" +
" int indis=get_global_id(0);" +
" int totalN=" + n + "; "+
" float x0=positions[0+3*(indis)];"+
" float y0=positions[1+3*(indis)];"+
" float z0=positions[2+3*(indis)];"+
" float fx=0.0f;" +
" float fy=0.0f;" +
" float fz=0.0f;" +
" for(int i=0;i<totalN;i++)" +
" { "+
" float x1=positions[0+3*(i)];" +
" float y1=positions[1+3*(i)];" +
" float z1=positions[2+3*(i)];" +
" float dx = x0-x1;" +
" float dy = y0-y1;" +
" float dz = z0-z1;" +
" float r=sqrt(dx*dx+dy*dy+dz*dz+0.01f);" +
" float tr=0.1f/r;" +
" float tr2=tr*tr*tr;" +
" fx+=tr2*dx*0.0001f;" +
" fy+=tr2*dy*0.0001f;" +
" fz+=tr2*dz*0.0001f;" +
" } "+
" forces[0+3*(indis)]+=fx; " +
" forces[1+3*(indis)]+=fy; " +
" forces[2+3*(indis)]+=fz; " +
"}"