Using AMD GPU Shader Analyzer it showed that float bias = 0.005 * sqrt ( 1.f - N_L_dot * N_L_dot ) / N_L_dot ;
Will generate fewer clock cycle instructions in the shader assembly ( 4 instructions estimating 4 clock cycles).
Where the float bias = 0.005 * tan( acos ( N_L_dot ) );
generated 15 instructions estimating 8 clock cycles to complete.
I ran the two different methods against the Radeon HD 6450 Assembly code. But the results seemed to track well for the different Radeon HD cards.
Looks like the sqrt method will generally perform better.