Thanks for the useful comments following my question. I've now figured out that the poor performance is due to the way threads get scheduled on RHEL 5 on a VM that only uses one of my CPU cores. The issue goes away when using multiple cores, and also for a single core the following modification to the Boost example (i.e. making the producer sleep a short time when the queue is full) greatly improves overall speed:
void producer(void)
{
for (int i = 0; i != iterations; ++i) {
int value = ++producer_count;
while (!spsc_queue.push(value))
usleep(1000); // WAS: ; (empty statement)
}
}