It sounds like your data is not 16 byte aligned, which is a requirement for SSE loads such as mm_load_ps
. You can either:
- use
_mm_loadu_ps
as a temporary workaround. On newer CPUs the performance hit for misaligned loads such as this is fairly small (on older CPUs it's much more significant), but it should still be avoided if possible
or
- fix your memory alignment. On Windows/Visual Studio you can use the
declspec(align(16))
attribute for static allocations or_aligned_malloc
for dynamic allocations. For gcc and most other civilised platforms/compilers use__attribute__ ((align(16)))
for the former andposix_memalign
for the latter.