How long does it take for OpenGL to actually update the screen?

Question 1

Where are graphics buffered between a call to swap buffers and actually showing up on screen? Why the delay? It sure looks like the app is drawing many frames ahead of the screen at all times.

The command is queued, whatever drawn to the backbuffer, waits till next vsync if you have set swapInterval and at the next vsync, this buffer should be displayed.

What can an OpenGL application do to cause an immediate draw to screen? (ie: no buffering, just block until draw is complete; I don't need high throughput, I do need low latency)

Use of glFinish will ensure everything is drawn before this API returns, but no control over when it actually gets to the screen other than swapInterval setting.

What can an application do to make the above immediate draw happen as fast as possible? How can an application know what is actually on screen right now? (Or, how long/how many frames the current buffering delay is?)

Generally you can use sync (something like http://www.khronos.org/registry/egl/extensions/NV/EGL_NV_sync.txt) to find out this.

Are you sure the method of measuring latency is correct ? What if the key input actually has significant delay in your PC ? Have you measured latency from the event having been received in your code, to the point after swapbuffers ?

Question 2

You must understand that the GPU has specially dedicated memory available (on board). At the most basic level this memory is used to hold the encoded pixels you see on your screen (it is also used for graphics hardware acceleration and other stuff, but that is unimportant now). Because it takes time loading a frame from your main RAM to your GPU RAM you can get a flickering effect: for a brief moment you see the background instead of what is supposed to be displayed. Although this copying happens extremely fast, it is noticeable to the human eye and quite annoying.

To counter this, we use a technique called double buffering. Basically double buffering works by having an additional frame buffer in your GPU RAM (this can be one or many, depending on graphics library you are working with and the GPU, but two is enough to work) and using a pointer to indicate which frame should be displayed. Thus while the first frame is being displayed, you are already creating the next in your code using some draw() function on an image structure in main RAM, this image is then copied to your GPU RAM (while still displaying the previous frame) and then when calling eglSwapBuffers() the pointer switches to your back buffer (I guessed it from your question, I'm not familiar with OpenGL, but this is quite universal). You can imagine this pointer switch does not require very much time. I hope you see now that directly writing an image to the screen actually causes much more delay (and annoying flickering).

Also ~16.6 milliseconds does not sound like that much. I think most time is lost creating/setting the required data structures and not really in the drawing computations itself (you could test this by just drawing the background).

At last I like to add that I/O is usually pretty slow (slowest part of most programs) and 150ms is not that long at all (still twice as fast as a blink of an eye).

Question 3

Ah, yes you've discovered one of the peculiarities of the interaction of OpenGL and display systems only few people actually understand (and to be frank I didn't fully understand it until about 2 years ago as well). So what is happening here:

SwapBuffers does two things:

it queues a (private) command to the command queue that's used also for OpenGL drawing calls that essentially flags a buffer swap to the graphics system
it makes OpenGL flush all queued drawing commands (to the back buffer)

Apart from that SwapBuffers does nothing by itself. But those two things have interesting consequences. One is, that SwapBuffers will return immediately. But as soon as the "the back buffer is to be swapped" flag is set (by the queued command) the back buffer becomes locked for any operation that would alter its contents. So as long no call is made that would alter the contents of the back buffer, things will not block. And commands that would alter the contents of the back buffer will halt the OpenGL command queue until the back buffer has been swapped and released for further commands.

Now the length of the OpenGL command queue is an abstract thing. But the usual behavior is, that one of the OpenGL drawing commands will block, waiting for the queue to flush in response to swap buffers having happened.

I suggest you spray your program with logging statements using some high performance, high resolution timer as clock source to see where exactly the delay happens.

Question 4

Latency will be determined both by the driver, and by the display itself. Even if you wrote directly to the hardware, you would be limited by the latter.

The application can only do so much (i.e. draw fast, process inputs as closely as possible to or during drawing, perhaps even modify the buffer at the time of flip) to mitigate this. After that you're at the mercy of other engineers, both hardware and software.

And you can't tell what the latency is without external monitoring, as you've done.

Also, don't assume your input (keyboard to app) is low latency either!