Using volatile boolean variable for busy waiting

Question 1

The article doesn't say that volatile is all you need (indeed, it is not), only that it can be useful.

If you do this, and if you use the simple generic component LockingPtr, you can write thread-safe code and worry much less about race conditions, because the compiler will worry for you and will diligently point out the spots where you are wrong.

Question 2

I don't really get how does it work.

It relies on two assumptions:

reads and writes to boolean variables are atomic;
all threads have a uniform view of memory, so that modifications made on one thread will be visible to others within a short amount of time without an explicit memory barrier.

The first is likely to hold on any sane architecture. The second holds on any single-core architecture, and on the multi-core architectures in widespread use today, but there's no guarantee that it will continue to hold in the future.

the code above could be safely rewritten with C++11 by using std::atomic

Today, it can and should be. In 2001, when the article was written, not so much.

if we have more then one write we may have problems with memory ordering

Indeed. If this mechanism is used for synchronisation with other data, then we're relying on a third assumption: that modification order is preserved. Again, most popular processors give that behaviour, but there's no guarantee that this will continue.

why so many people use volatile bool for busy wait

Because they can't or won't change habits they formed before C++ acquired a multi-threaded memory model.

and is it really portable?

No. The C++11 memory model doesn't guarantee any of these assumptions, and there's a good chance that they will become impractical for future hardware to support, as the typical number of cores grows. volatile was never a solution for thread synchronisation, and that goes doubly now that the language does provide the correct solutions.