Why does assignment to an element of an AVX-Vector-wrapper-class-object-array provoke access violation errors?
-
27-05-2021 - |
Question
I am trying to do some vector stuff and wrote a wrapper for the m256d datatype from immintrin.h to use overloaded operators. The following example should give you a basic idea.
Class definition
#include <immintrin.h>
using namespace std;
class vwrap {
public:
__m256d d;
vwrap(void) {
this->d = _mm256_set_pd(0.0,0.0,0.0,0.0);
}
void init (const double &a, const double &b, const double &c) {
this->d = _mm256_set_pd(0.0,c,b,a);
}
};
Array of vwrap objects
Let's imagine an array of vwrap that is allocated dynamically:
vwrap *a = (vwrap*) malloc(sizeof(vwrap)*2);
Access violation errors
Using a function of a vwrap object that contains a mm256-set-function... provokes an access violation error.
a[0].init(1.3,2.3,1.2);
The same thing is happening for assigning d with a mm256-set-function (assigning another m256d-object doesn't work as well):
a[0].d = _mm256_set_pd(1,2,3,4);
Copying data from another object isn't working, too.
vwrap b;
a[0].d = b.d;
Stuff that works
The m256d-object can be manipulated without any problems:
a[0].d.m256d_f64[0] = 1.0;
a[0].d.m256d_f64[1] = 2.0;
a[0].d.m256d_f64[2] = 3.0;
a[0].d.m256d_f64[3] = 4.0;
The assignments are working in case of a normal class instance:
vwrap b,c;
__mm256d t = _mm256_set_pd(1,2,3,5);
b.d = _mm256_set_pd(1,2,3,4);
b.d = t;
b.d = c.d;
I don't get the problem. Why can't I use the _mm256 functions (or assign a m256d-object) in case of a class array? My only idea is to avoid using the mm256-functions and manipulate the double values directly. But this is not what I intentionally wanted to do.
La solution
It's likely an alignment problem. __m256d
need to be aligned on 32 byte boundaries. You shouldn't use malloc
when alignment is a concern, use new
or aligned malloc
.
The your stack-allocated variables work correctly is that the compiler aligns them properly, because it knows they need to be aligned. Whereas when you call malloc
, there's no way the runtime knows what you plan to store in the memory it gives you. Therefore, you need to either explicitly request alignment using aligned malloc
, or use type-aware allocation which is what new
is for.
Changing
vwrap *a = (vwrap*) malloc(sizeof(vwrap)*2);
to
vwrap *a = new vwrap[2];
vwrap *a = (vwrap*) _aligned_malloc(sizeof(vwrap)*2, 32);
should work.
EDIT: After trying this out on Windows with GCC 4.6.1 (compiler switch -march=corei7-avx
) it seems new
doesn't respect alignment requirements. Changing the new call to use _aligned_malloc
works.