XSetWMProtocols and glXCreateContext calling order in a Multithreaded environment

https://stackoverflow.com/questions/11272054

18-06-2021
|

Question

Edit: I posted a nice solution to the whole renderer separation problem in general below.

I am recently playing around with OpenGL in a multithreaded X11 environment. I found the following tutorial, which compiles, links and runs fine.

Bu then I came across a strange issue after trying to adapt the code for my own needs.

In the tutorial, the calling order of XCreateWindow, glXCreateContext, XSelectInput and XSetWMProtocols is as follows:

param[i].win = XCreateWindow(param[i].d_, root, 200,200, 
                   300,200, 0, visInfo->depth, InputOutput, visInfo->visual,
                   CWColormap,
                   &windowAttr);
param[i].ctx = glXCreateContext(param[i].d_, visInfo,  NULL, True);
XSelectInput(d, param[i].win, StructureNotifyMask);
XSetWMProtocols(d, param[i].win, &(delMsg), 1);

Please note, that XCreateWindow and XSelectInput/XSetWMProtocols use different display connections.

However, when changing the order of the calls to

param[i].win = XCreateWindow(param[i].d_, root, 200,200, 
                   300,200, 0, visInfo->depth, InputOutput, visInfo->visual,
                   CWColormap,
                   &windowAttr);
XSelectInput(d, param[i].win, StructureNotifyMask);
XSetWMProtocols(d, param[i].win, &(delMsg), 1);
param[i].ctx = glXCreateContext(param[i].d_, visInfo,  NULL, True);

the program fails with

X Error of failed request: BadWindow (invalid Window parameter)
Major opcode of failed request: 2 (X_ChangeWindowAttributes)
Resource id in failed request: 0x5000002 Serial number of failed request: 17 Current serial number in output stream: 18

which seems to be caused by XSetWMProtocols.

Since different display connections were being used, I would not be surprised if the whole thing didn't work in the first place. But somehow, after the call to glXCreateContext, everything seems to be magically fine.

I am relatively new to X11/GLX programming, did I miss something? What kind of magic does glXCreateContext perform? Or did something else happen? Or maybe I should simply move on, because OpenGL and multithreading always seem to cause problems.

My solution:

I was lazy and just using the approach from the tutorial. That worked until adding freetype to my project, which suddenly gave me a BadWindow crash again. So, even if everything seems fine, when you are working from different threads, X11 is seriously fucking around with some memory while you are not around. (It was not me, I checked with valgrind)

My current solution is as n.m. commented: I put everything into a GUI thread (X11 and GL/GLX calls), whose ressources are never available to other threads. However, two things have to be kept in mind, because it might slow down your rendering loop:

Slow message processing delays rendering (as stated by ilmale below)
Slow rendering delays message processing (my concern)

The first problem can easily be fixed. Create a stl deque or list or any container where you enqueue the relevant XEvents for your app logic and then grab them from another thread. Just make sure that your STL is threadsafe and in doubt implement your own queue. With a waiting condition set on the containers size you can even simulate blocking calls like XNextEvent.

The second problem is tricky. You might argue that if the renderer is at 1 fps or slower, the game or application is useless anyway. That is true. But it would be neat if you are able to process some kill signal (e.g. the destroy window atom) even if you are at 0.1 fps. The only solution I could think of is checking for new messages after rendering every thousand sprites or so. Send them to your container and continue rendering. Of course, in that case you can never let the rendering thread run user scripts or other unknown code at any time. But I guess, that would make the idea of separating rendering from the other threads pointless anyway.

Hope this helps.

Solution

I agree with n.m. and I'm the guy that wrote the tutorial. :D The problem I was try to solve is to decouple the event loop from the rendering loop so I can replay to event's without affecting the rendering and vice versa. I was writing a LUA framework and my "processMessage(event)" function could potentially call an user defined Lua function.

While I was writing the event loop I had a lot of issue like the one you had, I also tried XCB that worked on Fedora but crashed on Ubuntu, after a lot of headache I found the solution with different display (for the X server is like serving different process) with shared GlContext and another thread for loading (texture and meshes).

Returning to your problem:

XSetWMProtocols(...)

want the same display where the windows is created, but only on some version of X. That's why now I'm using Qt.

OTHER TIPS

I have basically went through the same trials of mutlithreaded X11 and Win32 in a cross-platform project.

One thing I've noticed is that X11 isn't modifying memory as much as the posts above indicate. Sure there's some strange ordering of various commands but once you get it right it seems to be rather stable.

Specifically one item that nearly made me throw in the towel was background GPU processing! It was this very odd and hard to catch runtime race condition that had me thinking the OS was to blame.

After sending textures to the card inside a display list (cough, when implementing freetype as well), immediately drawing the resource would sometimes cause slight corruption of the font display list, even for later draws. The display list itself was corrupted and I even resorted to implementing a global OpenGL lock just to prove the threading wasn't to blame. But WHY was it getting corrupted? The OS? Nope, the GPU.

I believe shared GLX contexts force a different behavior on some cards, notably nvidia on my system. It wasn't the other threads causing my woes, but rather the shared flag on the createContext call combined with the lack of a glFinish() before using the resource. That and a few best practices that which I'll explain below.

In 99% of the runs, it would work fine without the glFinish() even with multithreading. Only upon load does the condition occur, so constantly stopping/restarting the app would eventually expose it. If it loaded everything without issue, the app would run fine from there on. If there were problems, the image would stay corrupted until I reloaded it.

All of the issues were fixed by adhearing to these simple rules.

Create the 2nd GLContext in the non-main() thread. (Don't create both contexts in the same thread and give the 2nd thread the pointer, it's NOT stable that way)
When loading a resource in the 2nd thread, add a glFinish() prior to placing the result on the queue to be used. (simply put, glFinish() before using resources)
Once you call makeCurrent() on the 2nd context inside the 2nd thread, call the getCurrentContext() function and wait for it to be non-NULL before having either thread perform other OpenGL resource loading or calls. Sometimes on the 2nd thread it(makeCurrent) returns but getCurrentContext() may still be NULL on some video cards for a short while. Not sure why or how the driver lets that happen but the check saves the app from crashing.

I implemented those practices in my 6-Thread+ app and the odd 1-off corruption problems vanished never to return.

Turns out X11 is not so mean from my experience... the video card is, but it's really just picky more than anything. In my case I'm even using typedefs to write Linux/Windows code using non-specific functions which complicates things further and still this is a manageable beast if proper precautions are taken :).

It's quirky but not an "Avoid at all Costs" issue if you ask me. I hope this post helps and Good Luck!

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow