Question

I am trying to get SSE functionality in my vector class (I've rewritten it three times so far. :\) and I'm doing the following:

#ifndef _POINT_FINAL_H_
#define _POINT_FINAL_H_

#include "math.h"

namespace Vector3D
{

#define SSE_VERSION 3

#if SSE_VERSION >= 2

    #include <emmintrin.h>  // SSE2

    #if SSE_VERSION >= 3

        #include <pmmintrin.h>  // SSE3

    #endif

#else

#include <stdlib.h>

#endif

#if SSE_VERSION >= 2

    typedef union { __m128 vector; float numbers[4]; } VectorData;
    //typedef union { __m128 vector; struct { float x, y, z, w; }; } VectorData;

#else

    typedef struct { float x, y, z, w; } VectorData;

#endif

class Point3D
{

public:

    Point3D();
    Point3D(float a_X, float a_Y, float a_Z);
    Point3D(VectorData* a_Data);
    ~Point3D();

    // a lot of not-so-interesting functions

private:

    VectorData* _NewData();

}; // class Point3D

}; // namespace Vector3D

#endif

It works! Hurray! But it's slower than my previous attempt. Boo.

I've determined that my bottle neck is the malloc I'm using to get a pointer to a struct.

VectorData* Point3D::_NewData() 
{ 

#if SSE_VERSION >= 2

    return ((VectorData*) _aligned_malloc(sizeof(VectorData), 16)); 

#else

    return ((VectorData*) malloc(sizeof(VectorData))); 

#endif

}

One of the main problems with using SSE in a class is that it has to be aligned in memory for it to work, which means overloading the new and delete operators, resulting in code like this:

 BadVector* test1 = new BadVector(1, 2, 3);
 BadVector* test2 = new BadVector(4, 5, 6);
 *test1 *= test2;

You can no longer use the default constructor and you have to avoid new like the plague.

My new approach is basically to have the data external from the class so the class doesn't have to be aligned.

My question is: is there a better way to get a pointer to an (aligned on memory) instance of a struct or is my approach really dumb and there's a much cleaner way?

Was it helpful?

Solution

How about:

__declspec( align( 16 ) ) VectorData vd;

?

You can also create your own version of operator new as follows

void* operator new( size_t size, size_t alignment )
{
     return __aligned_malloc( size, alignment );
}

which can then make allocationas follows

AlignedData* pData = new( 16 ) AlignedData;

to align at a 16 byte boundary.

If thats no help then i may be misunderstanding what you are asking for ...

OTHER TIPS

You should probably not expect to get improved performance for single-use vectors. Parallel processing shines brightest when you can combine the parallel processing with some volume, i.e. when processing many vectors in sequence.

I fixed it. :O

It was really rather easy. All I had to do was turn

VectorData* m_Point;

into

VectorData m_Point;

and my problems are gone, with no need for malloc or aligning.

But I appreciate everyone's help! :D

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top