I don't really get how does it work.
It relies on two assumptions:
- reads and writes to boolean variables are atomic;
- all threads have a uniform view of memory, so that modifications made on one thread will be visible to others within a short amount of time without an explicit memory barrier.
The first is likely to hold on any sane architecture. The second holds on any single-core architecture, and on the multi-core architectures in widespread use today, but there's no guarantee that it will continue to hold in the future.
the code above could be safely rewritten with C++11 by using std::atomic
Today, it can and should be. In 2001, when the article was written, not so much.
if we have more then one write we may have problems with memory ordering
Indeed. If this mechanism is used for synchronisation with other data, then we're relying on a third assumption: that modification order is preserved. Again, most popular processors give that behaviour, but there's no guarantee that this will continue.
why so many people use volatile bool for busy wait
Because they can't or won't change habits they formed before C++ acquired a multi-threaded memory model.
and is it really portable?
No. The C++11 memory model doesn't guarantee any of these assumptions, and there's a good chance that they will become impractical for future hardware to support, as the typical number of cores grows. volatile
was never a solution for thread synchronisation, and that goes doubly now that the language does provide the correct solutions.