Thanks for pointing this out. Yes, I think that the sample code (which, in its defense, was intended to be concise) is broken. It should probably wait for the threads to finish.
Any implementation that allows threads to be blocked in wait() is almost certainly going to involves some kind of condition variable, and destroying the latch while a thread has not yet exited wait() is potentially undefined.
I don't know if there's time to update the paper, but I can make sure that the next version is fixed.
Alasdair