Domanda

My GPU seems to allow 562% use of global memory and 133% use of local memory for a simple PyOpenCL matrix addition kernel. Here is what my script prints:

GPU: GeForce GTX 670

Global Memory - Total: 2 GB
Global Memory - One Buffer: 3.750000 GB
Number of Global Buffers: 3
Global Memory - All Buffers: 11.250000 GB
Global Memory - Usage: 562.585844 %

Local Memory - Total:  48 KB
Local Memory - One Array: 32.000000 KB
Number of Local Arrays: 2
Local Memory - All Arrays: 64.000000 KB
Local Memory - Usage: 133.333333 %

If I increase global memory use much above this point, I get the error: mem object allocation failure

If I increase local memory use above this point, I get the error: invalid work group size

Why doesn't my script fail immediately when memory use of local or global exceeds 100%?

È stato utile?

Soluzione

Global size is multiplied by 32, thats the error.

When clearly a float32 has 4bytes, this makes a and b arrays 4 bytes each. Not 32.

So the proper results for you would be:

Global Memory - Total: 2 GB
Global Memory - One Buffer: 0.4687500 GB
Number of Global Buffers: 3
Global Memory - All Buffers: 1.40625 GB
Global Memory - Usage: 70.3125 %

Local Memory - Total:  48 KB
Local Memory - One Array: 4.000000 KB
Number of Local Arrays: 2
Local Memory - All Arrays: 8.000000 KB
Local Memory - Usage: 16.6666666 %
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top