OpenGL Uniform Buffer std140 layout, a driver bug or did I misunderstand the specification?

https://stackoverflow.com/questions/7451476

21-01-2021
|

Question

The OpenGL specification lies (or is this a bug?)... Referring to the layout for std140, with shared uniform buffers, it states:

"The set of rules shown in Tabl e L-1 are used by the GLSL compiler to layout members in a std140-qualified uniform block. The offsets of members in the block are accumulated based on the sizes of the previous members in the block (those declared before the variable in question), and the starting offset. The starting offset of the first member is always zero.

Scalar variable type (bool, int, uint, float) - Size of the scalar in basic machine types"

(http://www.opengl-redbook.com/appendices/AppL.pdf)

So, armed with this information, I setup a uniform block in my shader that looks something like this:

// Spotlight.

layout (std140) uniform Spotlight
{
    float Light_Intensity;
    vec4  Light_Ambient;
    vec3  Light_Position;   
};

... only to discover it doesn't work with the subsequent std140 layout I setup on the CPU side. That is the first 4 bytes are a float (size of the machine scalar type for GLfloat), the next 16 bytes are a vec4 and the following 12 bytes are a vec3 (with 4 bytes left over on the end to take account of the rule that a vec3 is really a vec4).

When I change the CPU side to specify a float as being the same size as a vec4, i.e. 16 bytes, and do my offsets and buffer size making this assumption, the shader works as intended.

So, either the spec is wrong or I've misunderstood the meaning of "scalar" in this context, or ATI have a driver bug. Can anyone shed any light on this mystery?

Solution

That PDF you linked to is not the OpenGL specification. I don't know where you got it from, but that is certainly not the full list of rules. Always check your sources; the spec is not as unreadable as many claim it to be.

Yes, the size of variables of basic types is the same size as the basic machine type (ie: 4 bytes). But size alone does not determine the position of the variable.

Each type has a base alignment, and no matter where that type is found in a uniform block, it's overall byte offset must fit that alignment. The base alignment of a vec4 is 4 * the alignment of its basic type (ie: float). So the base alignment of a vec4 is 16.

Because Light_Intensity ends after 4 bytes, the compiler must insert 12 bytes of padding, because Light_Ambient cannot be on a 4-byte boundary. It must be on a 16-byte boundary, so the compiler uses 12 bytes of empty space.

ATI does have a few driver bugs around std140 layout, but this isn't one of them.

As a general rule, I like to explicitly put padding into my structures, and I avoid vec3 (because it has 16 byte alignment). Doing these generally cuts down on compiler bugs as well as accidental misunderstanding about where things go and how much room they actually take.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow