For my understanding, both approaches use polling to synchronize. In pseudo-code for CudaDeviceScheduleSpin
:
while (!IsCudaJobDone())
{
}
whereas CudaDeviceScheduleYield
:
while (!IsCudaJobDone())
{
Thread.Yield();
}
i.e. CudaDeviceScheduleYield
tells the operating system that it can interrupt the polling thread and activate another thread doing other work. This increases the performance for other threads on CPU but also increases latency, in case the CUDA job finishes when another thread than the polling one is active in that very moment.