If your target platform has a math library, use it. If it is any good, it was written by experts who were considering speed. You should not base code design on guesses about what is fast or slow. If you do not have actual measurements or processor specifications, and you do not know trigonometric functions in your application are consuming a lot of time, then you do not have good reason for replacing the math libraries.
Floating-point instructions typically have longer latencies than integer instructions, but they are pipelined so that throughput may be comparable. (E.g., a floating-point unit might have four stages to do the work, so an instruction takes four cycles to work through all the stages, but you can push a new instruction into the first stage in each cycle.) Whether the pipelining is sufficient to provide performance on a par with an integer implementation depends greatly on the target processor, the algorithm being used, and the skill of the implementor.
If it is beneficial in your case to use custom implementations of the math routines, then how they should be designed is hugely dependent on circumstances. Proper advice depends on the domain to support (Just 0 to 2π? –2π to +2π? Possibly larger values, which have to be folded to -π to π?), what special cases needed to be supported (Propagate NaNs?), the accuracy required, what else is happening in the processor (Is a lot of memory in use or can we rely on a lookup table remaining in cache?), and more.
A significant part of the trigonometric routines is handling various cases (NaNs, infinities, small values) and reducing arguments modulo 2π. It may be possible to implement stripped-down routines that do not handle special cases or perform argument reduction but still use floating-point.