_mm_load_ps vs vs _mm_load_pd etc sur Intel x86 ISA

https://stackoverflow.com/questions/8856044

28-10-2019
|

Question

Quelle est la différence entre les deux lignes suivantes?

__m128 x = _mm_load_ps((float *) ptr);
__m128 y = _mm_load_pd((double *)ptr);

En d'autres termes, pourquoi y at-il tant de différentes instructions de _mm_load_xyz, au lieu d'un __m128 _mm_load(const void *) générique?

La solution

There are different intrinsics because they correspond to different instructions.

There are different load instructions because Intel wants to maintain the freedom to design a processor on which double-precision vectors are backed by a different physical register file than are single-precision vectors or integer vectors, or use different execution units. Any of these might add additional latency if there were not a way to specify that data should be loaded into the appropriate register file or forwarding network.

One way to think about it is that the different instructions do the "same thing", but additionally provide a hint to the processor telling it how the data that is being loaded will be used by future instructions. This may help the processor make sure that the data is in the right place to be used as efficiently as possible, or it may be ignored by the processor.

Note that this isn't just a hypothetical. There exist processors on which using an integer vector load (MOVDQA) to load data that is consumed by a floating-point operation requires more time than using a floating-point load to get data for a floating-point operation (and vice-versa). See the Intel Optimization Manual, or Agner Fog's notes for more detail on the subject. Use the load that matches how you will use the data to avoid the risk of such performance hazards in the future.

Autres conseils

_mm_load_ps loads 4 single precision floating point values

_mm_load_pd loads 2 double precision floating point values

These do different things, so I think it just makes sense to have different functions. Also, in C, there's no overloading.

Licencié sous: CC-BY-SA avec attribution

Non affilié à StackOverflow