GDDR memory is pretty high latency and modern GPU architectures have plenty of number crunching capabilities. It used to be the other way around, GPUs were so ill-equipped to do calculations that normalization was cheaper to do by fetching from a cube map.
Throw in the fact that you are not doing a regular texture lookup here, but rather a dependent lookup and it comes as no surprise. Since the location you are fetching from depends on the result of another fetch, it is impossible to pre-fetch / efficiently cache (an effective latency hiding strategy) the memory needed by your shader. That is no "simple texture lookup."
What is more, in addition to doing a dependent texture lookup your second shader also includes the discard
keyword. This will effectively eliminate the possibility of early depth testing on a lot of hardware.
Honestly, I do not see why you want to "optimize" the distortionFactor (...)
function into a lookup. It uses squared length, so you are not even dealing with a sqrt
, just a bunch of multiplication and addition.