"The problem" might be that you have hardware that implements sqrt()
now, making it faster than a software approach. It's hard to tell without much more detail about your system and perhaps some profiling and disassembly data.
See this answer for details about the number of cycles for the x86 fsqrt
instruction, for instance.