Why does assignment to an element of an AVX-Vector-wrapper-class-object-array provoke access violation errors?

StackOverflow https://stackoverflow.com/questions/9916419

Question

I am trying to do some vector stuff and wrote a wrapper for the m256d datatype from immintrin.h to use overloaded operators. The following example should give you a basic idea.

Class definition

#include <immintrin.h>
using namespace std;
class vwrap {
public:
  __m256d d;
  vwrap(void) { 
    this->d = _mm256_set_pd(0.0,0.0,0.0,0.0); 
  }
  void init (const double &a, const double &b, const double &c) { 
    this->d = _mm256_set_pd(0.0,c,b,a);
  }
};

Array of vwrap objects

Let's imagine an array of vwrap that is allocated dynamically:

vwrap *a = (vwrap*) malloc(sizeof(vwrap)*2);

Access violation errors

Using a function of a vwrap object that contains a mm256-set-function... provokes an access violation error.

a[0].init(1.3,2.3,1.2);

The same thing is happening for assigning d with a mm256-set-function (assigning another m256d-object doesn't work as well):

a[0].d = _mm256_set_pd(1,2,3,4);

Copying data from another object isn't working, too.

vwrap b;
a[0].d = b.d;

Stuff that works

The m256d-object can be manipulated without any problems:

a[0].d.m256d_f64[0] = 1.0;
a[0].d.m256d_f64[1] = 2.0;
a[0].d.m256d_f64[2] = 3.0;
a[0].d.m256d_f64[3] = 4.0;

The assignments are working in case of a normal class instance:

vwrap b,c;
__mm256d t = _mm256_set_pd(1,2,3,5);
b.d = _mm256_set_pd(1,2,3,4); 
b.d = t;
b.d = c.d;

I don't get the problem. Why can't I use the _mm256 functions (or assign a m256d-object) in case of a class array? My only idea is to avoid using the mm256-functions and manipulate the double values directly. But this is not what I intentionally wanted to do.

Était-ce utile?

La solution

It's likely an alignment problem. __m256d need to be aligned on 32 byte boundaries. You shouldn't use malloc when alignment is a concern, use new or aligned malloc.

The your stack-allocated variables work correctly is that the compiler aligns them properly, because it knows they need to be aligned. Whereas when you call malloc, there's no way the runtime knows what you plan to store in the memory it gives you. Therefore, you need to either explicitly request alignment using aligned malloc, or use type-aware allocation which is what new is for.

Changing

vwrap *a = (vwrap*) malloc(sizeof(vwrap)*2);

to

vwrap *a = new vwrap[2];

vwrap *a = (vwrap*) _aligned_malloc(sizeof(vwrap)*2, 32);

should work.

EDIT: After trying this out on Windows with GCC 4.6.1 (compiler switch -march=corei7-avx) it seems new doesn't respect alignment requirements. Changing the new call to use _aligned_malloc works.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top