Why does my lock-free message queue segfault :(?

Question 1

Your lock-free queue fails to work because you did not start with at least a semi-formal proof of correctness, then turn that proof into an algorithm with the proof being the primary text, comments connecting the proof to the code, all interconnected with the code.

Unless you are copy/pasting someone else's implementation who did do that, any attempt to write a lock-free algorithm will fail. If you are copy-pasting someone else's implementation, please provide it.

Lock free algorithms are not robust unless you have such a proof that they are correct, because the kind of errors that make them fail are subtle, and extreme care must be taken. Simply "rolling" a lock free algorithm, even if it fails to result in apparent problems during testing, is a recipe for unreliable code.

One way to get around writing a formal proof in this kind of situation is to track down someone who has written proven correct pseudo code or the like. Sketch out the pseudo code, together with the proof of correctness, in comments. Then fill in the code in the holes.

In general, proving an "almost correct" lock free algorithm is flawed is harder than writing a solid proof that a lock free algorithm is correct if implemented in a particular way, then implementing it. Now, if your algorithm is so flawed that it is easy to find the flaws, then you aren't showing a basic understanding of the problem domain.

In short, by posting "why is my algorithm wrong", you are approaching how to write lock free algorithms incorrectly. "Where is the flaw in my proof?", "I proved this pseudo-code correct here, and then I implemented it, why do my tests show deadlocks?" are good lock-free questions. "Here is a bunch of code with comments that merely describe what the next line of code does, and no comments describing why I do the next line of code, or how that line of code maintains my lock-free invariants" is not a good lock-free question.

Step back. Find some proven-correct algorithms. Learn how the proof work. Implement some proven correct algorithms via monkey-see monkey-do. Look at the footnotes to note the issues their proof overlooked (like A-B issues). After you have a bunch of those under your belt, try a variation, and do the proof, and check the proof, and do the implementation, and check the implementation.

Question 2

if I understand your code correctly, there are data races, e.g.:

// producer
int r0 = write_index.load(); // r0 == 0

// consumer
int r1 = write_index.fetch_xor(1); // r1 == 0
queue& active = queues[r1];
active.size();

// producer
queue[r0].push_back(...);

Now both threads access the same queue at the same time. That's a data race, and that means undefined behaviour.