I suspect that some power-saving feature of the hardware is at work here. A 10ms sleep is more than enough time for modern hardware to enter a low-power state. When you're looking at things at the microsecond level, there is a real, measurable latency to come out of a power-saving state.
My guess is that running the "busy" program in parallel prevents the hardware from entering a low power state. Standard things to try:
- At the BIOS level, disable any and all power-saving features including C-states
- At the OS level, disable cpuspeed (or whatever frequency scaling program your distro uses)
- Try booting with the "idle=poll" kernel parameter
That last suggestion is especially important for Sandy Bridge CPUs (which is what you have), at least with RHEL/CentOS 5.x (which I'm guessing you're running). I found the Linux kernel would still override some BIOS settings. Other Linux kernel params that may help you:
- intel_idle.max_state=0
- processor.max_cstate=0