OpenGL 4.0 GPU描画機能？

https://stackoverflow.com/questions/5047286

15-11-2019
|

質問

Wikipediaとその他のソースのOpenGL 4.0の説明この機能について読んでください：

CPU介入なしでOpenCLなどのOpenCLなどの外部APIによって生成されたデータの描画。
これは何を参照していますか？

編集：
は、このようなものであるように見え、これはi を信じるdraw_indirectを参照しなければならない。（基本的にOpenCL / CUDA）まるで、2回目の実行を過ぎた長時間の長さでGPUに滞在し続けることを通話を受けるための警告とトリックがあるかのように見えますが、可能なはずです。
CPUなしで描画コマンドを使用するか、または描画を描画することができる場合は、誰もが間接的に描画できる場合は、お気軽にお気軽にお問い合わせください。それは大いに感謝されます。

解決

I believe that you may be refering to GL_ARB_draw_indirect functionality that allows OpenGL to source the DrawArrays or DrawElements parameters from a GPU buffer object, that can be filled by OpenGL or OpenCL.

If I'm not mistaken, it's included in core OpenGL 4.

他のヒント

I haven't figured out how particularly OpenGL 4.0 makes this feature work, since it has existed before as well as far as I have understood. I'm not sure if this answers your question, but I'll tell what I know about the subject anyway.

It refers to a situation where some other library than OpenGL, such as OpenCL or CUDA, produces some data directly into the memory of the graphics card, and then OpenGL continues from where the other library left, and uses that data as

pixel buffer object (PBO) when they want to draw the data to the screen as it is
texture when they want to use the graphics data as a part of some other scene
vertex buffer object (VBO) when they want to use the produced data as some arbitrary attribute input for vertex shader. (one example of this might be a particle system which is simulated with CUDA and rendered with OpenGL)

In a situation like this, it's a very good idea to keep the data in the graphics card all the time and not copy it around, especially not copy it through CPU, because the PCIe bus is very slow when compared to the memory bus of the graphics card.

Here's some sample code to do the trick with CUDA and OpenGL for VBOs and PBOs:

// in the beginning
glGenBuffers(&id, 1);

// for every frame
cudaGLRegisterBufferObject(id);
CUdeviceptr ptr;
cudaGLMapBufferObject(&ptr, id);
// <launch kernel here>
cudaGLUnmapBufferObject(id);
// <now use the buffer "id" with OpenGL>
cudaGLUnregisterBufferObject(id);

And here's how you can load the data into a texture:

glBindBuffer(GL_PIXEL_UNPACK_BUFFER, id);
glBindTexture(GL_TEXTURE_2D, your_tex_id);
glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, 256, 256, GL_RGBA, GL_UNSIGNED_BYTE, 0);

Also note that if you use some more unusual format instead of GL_RGBA it might be slower because it has to convert all the values.

I don't know OpenCL but the idea is the same. Only function names are different.

Another way to do the same thing is what is called host pinned memory. In that approach you map some CPU memory address range to the graphics card memory.

To understand what this feature is, you must understand how things worked before.

Pre 4.0, OpenCL could fill OpenGL buffer objects with data. Indeed, regular OpenGL commands could fill OpenGL buffer objects with data, either with transform feedback or by rendering to a buffer texture. This data could be vertex data to be used for rendering.

Only the CPU can initiate the rendering of vertex data (by calling one of the glDraw* functions. Even so, there isn't a need for explicit synchronization here (outside of whatever OpenCL/OpenGL interop requires). Specifically, the CPU doesn't have to read data written by GPU operations.

But this leads to a problem. If OpenCL, or whatever GPU operation, always writes a known number of vertices to the buffer, then everything is fine. However, this does not have to be the case. It is often desirable for a GPU process to write an arbitrary number of vertices. Obviously there needs to be a maximum limit (the size of the buffer). But other than that, you want it to be able to write whatever it wants.

The problem is that OpenCL decided how many to write. But the CPU now needs that number in order to use one of the glDraw functions. If OpenCL wrote 22,000 vertices, then the CPU needs to pass 22,000 to glDrawArrays.

What ARB_draw_indirect (a core feature of GL 4.0) does is allow a GPU process to write values into a buffer object that represent the parameters you would pass to a glDraw* function. The only parameter not covered by this is the primitive type.

Note that the CPU still controls when the rendering happens. The CPU still decides what buffers vertex data are pulled from. So OpenCL can write several of these glDraw* commands, but until the CPU actually calls glDrawElementsIndirect for one of them, nothing actually gets rendered.

So what you can do is run an OpenCL process that will write some data to existing buffer objects. Then you bind those buffers using usual vertex setup, like with a VAO. The OpenCL process will write the appropriate rendering command data to other buffer objects, that you will bind as indirect buffers. And then you use glDraw*Indirect to render these commands.

At no time does the CPU have to read data back from the GPU.

ライセンス： CC-BY-SA と帰属

所属していません StackOverflow