Try using PIX to profile it. http://wtomandev.blogspot.com/2010/05/debugging-hlsl-shaders.html
Alternatively, read this rambling speculation:
Maybe because with a constant, the compiler can unravel and collapse your loop's instructions. When you replace it with a variable, the compiler becomes unable to make the same assumptions.
Though, somewhat unrelated to your actual question, I would push a lot of those conditions /calculations to the software level.
if(IsDiffuseLightingEnabled || IsSpecularLightingEnabled)
^- Like that.
Also, I think you could precompute a few thing before you call the shader program as well. Like l = (lights[i].Position - IN.WorldPosition) / lights[i].Radius;
Pass a precomputed array of those rather than calculating each time over every pixel.
I might be misinformed of the optimizations that the HLSL compiler does, but I think each calculation you do like that on the pixel shader gets executed screen w*h times (though this is done insanely parallel), and I vaguely remember there being some limits to the number of instructions you could have in a shader (like 72?). (though I think that restriction was liberalized a lot in higher versions of HLSL). Maybe the fact that your shader generates so many instructions -- maybe it breaks your program up and turns it into a multi-pass pixel shader on compilation. If that's the case, that probably adds significant overhead.
Actually, here's another idea that might be stupid: Passing a variable to a shader has it transmit the data to the GPU. That transmission happens with limited bandwidth. Perhaps the compiler is smart enough such that when you're only staticly indexing the first 7 elements in an array, only transfer 7 elements. When the compiler doesn't make that optimization (because you aren't iterating with constants), it pushes the WHOLE array every frame, and you're flooding the bus. If that's the case, then my earlier suggestion of pushing calculations out, and passing more results in, would only make the problem worse, heh.