It is not necessarily memory access that degrade performance. If you use static scheduling (often the default), loops are divided into chunks that are assigned to threads. If the threads are bound to a core which is already busy, it will dramatically slow down your runtime performance. If you are running in an environment where you are not guaranteed to be the only user of the resources, you may get better performance with dynamic scheduling.
If you did not specify a scheduling type, run your program with
OMP_SCHEDULE=dynamic ./my_program
and see if it helps.