The GNU C library has a sincos() function, which will take advantage of the "FSINCOS" instruction which most modern instruction sets have. I'd say that's your best bet; it should be just as fast as the Intel library method.
If you don't do that, I'd go with the "sqrt(1-sin(x)^2)" route. In every processor architecture document I've looked at so far, the FSQRT instruction is significantly faster than the FSIN function.