Performance gain using interleaved attribute arrays in OpenGL4.0

Question 1

Is it also true for OpenGL running on desktop GPUs ?

As a general rule, you should use interleaved attributes wherever possible. Obviously if you need to change certain attributes and not others, then interleaving the ones that change with those that don't is not a good idea.

how big the performance gain can theoretically be ?

I can't really answer that, but I wouldn't expect huge improvement. The only sure way is to measure.

Question 2

In order for any optimization to be a performance gain, it must first optimize something that is a performance bottleneck. Unless it is currently a bottleneck, then doing anything about it will not necessarily improve performance.

There is no way to answer your question because any performance gain first depends on whether you are bottlenecked on vertex transfer performance (ie: what this optimizes). Unless you are actually pushing your graphics card so hard that your vertex shader, fragment shader, and CPU issues don't become bottlenecks, this won't matter.

And there's no way to know how much of a gain it is, because different hardware will respond differently. Different situations will respond differently based on how tight the bottleneck is.

Just interleave your attributes. It costs you nothing, requires minimal time or effort, and may be of non-trivial value performance-wise.

Question 3

The benefit of interleaved attribute arrays is memory locality. This means that all necessary vertex data is located next to each other and could be fetched more efficiently compared to data located in multiple buffers.

Having big number of vertices with many attributes might manifest the difference in performance. The values of big and many should be established by profiling.

Question 4

I might be wrong, but my perception is that the GPU needs the data (vertices, normals, and uv maps) when rendering say a vertex of a triangle and if the buffer for vertices, normals, and uvmaps is large for an object e.g. a large sphere (rendered with glvertex not glsphere)...

The GPU has to go back and forth for vertices, normals, and uvmaps while rendering a small rectangle because it can't store all of those in a buffer inside itself.

Communication over the bus is generally slower than the processor speed.

Now, in this case interleaved arrays are a great a gain and reduce bus communication and the GPU can easily process interleaved arrays and will have all the data available for a particuler set of vertices being rendered.