Easiest way to check is to use fxc with different profiles and check instruction count, in higher versions you can have new intrinsics that can reduce total instruction count.
I remember struggling to fit within the 64 limitation in ps_2_0, whereas compiling in ps_3_0 gave me 63 instructions.
I would (personal opinion here), use shader model 3 tho, even phones support the feature set nowadays.
As per performance indicator, you have different things to take into account of course, since you need to balance between arithmetic/texture fetch/fill rate.
Of course the lower the better in general, but 1 less arithmetic will not make you system much faster if you're draw call bound.
For profiling, if you don't already use this feature yet, I strongly recommend to implement Queries , in your case TimeStamp is the most useful one. It allows you to measure shader execution time. Occlusion is also very useful for measure overdraw in the forward pass (fill rate).