Вопрос

I'm trying to define a custom point type for the PCL library. In that tutorial, they're talking about memory alignment, so I started off by trying to understand how it works.

In this page, they present a rather simple way of calculating the total alignment of a structure. For example, this structure

// Alignment requirements
// (typical 32 bit machine)

// char         1 byte
// short int    2 bytes
// int          4 bytes
// double       8 bytes

// structure C
typedef struct structc_tag
{
  char        c;
  double      d;
  int         s;
} structc_t;

will have a size of 24:

1 byte for the char + 7 bytes of padding + 8 bytes for the double + 4 bytes for the int + 4 bytes of padding

and for g++ 4.8.1, sizeof returns 24. So far, so good.

Now, in PCL they're defining the point types with this scheme (here's the most simple point, that holds the position in each axis) for SSE alignment.

union
{
  float data[4];
  struct
  {
    float x;
    float y;
    float z;
  };
};

sizeof returns 16. With the union it is made sure that the point type is SSE aligned (I read here that is 16 byte alignment) and with the struct the axis values are accessible.

Quoting from the PCL docs:

The user can either access points[i].data[0] or points[i].x for accessing say, the x coordinate.

Is my reasoning valid until here?


In my case, I want to change the floats for doubles in order to have more precision in the X and Y axis.

So, is it enough to declare the point type as:

union {
  float data[4];
  struct {
    double x;
    double y;
    float z;
  };
};

? sizeof returns 24, which is not a multiple of 16 (so I understand it's not SSE aligned) but it is "double aligned".

My question is, how can I define my point type to be able to store the X and Y coordinates as double and still be SSE aligned?

PS: Also, if any of you know of a good resource for this, please tell me. I want to understand better this topic.

PS 2: I forgot to tell, the platform I'm trying all of this is a 64 bit one.

PS 3: If possible, I'm interested in pre-C++11 solutions. A compiler as old as g++ 4.4 (and its MinGW counterpart) must be able to build the new point type.

Это было полезно?

Решение

The size of the object and it's aligment are not the same thing. If the size of the struct is 16 bytes or some multiple it does not mean it will necessarily be 16 byte aligned.

In your case since your code is compiled in 64-bit mode you just need to pad the struct to 32 bytes. In 64-bit mode the stack is 16 byte aligned in Windows and Linux/Unix.

In 32-bit mode it does not have to be 16 byte aligned. You can test this. If you run the code below in MSVC in 32-bit mode you will likely see that the address for each element of the array is not 16 byte aligned (you might have to run it a few times). So even though the size of the struct is a multiple of 16 bytes it is not necessarily 16 byte aligned.

#include <stdio.h>

int main() { 
    union a {
        float data[4];
        struct {
            double x;
            double y;
            float z;
            float pad[3];
    };
    a b[10];
    for(int i=0; i<10; i++) {
        printf("%d\n", ((int)&b[i])%16);
    }
}

If you want your code to work in 32-bit mode as well then you should align the memory. If you run the code below in 32-bit mode on Windows or Linux you will see that it's always 16 byte aligned as well.

#include <stdio.h>
#ifdef _MSC_VER // If Microsoft compiler
#define Alignd(X) __declspec(align(16)) X
#else // Gnu compiler, etc.
#define Alignd(X) X __attribute__((aligned(16)))
#endif

int main() {
    union a {
        float data[4];
        struct {
            double x;
            double y;
            float z;
            float pad[3];
    };
    a Alignd(b[10]);
    for(int i=0; i<10; i++) {
        printf("%d\n", ((int)&b[i])%16);
    }
}

Другие советы

In order to have a struct which has 2 doubles and a float, and be SSE aligned (16 bytes), use :

#pragma pack(1)
struct T
{
 double x,y;   // 16 bytes
 float z;      // 4 bytes
 char gap[12]; // 12 bytes
};

sizeof(T) will be 32, so if the first point is 16-bytes aligned, the whole vector will be aligned.

In order to make the first point aligned you should use __attribute((aligned(16)) for stack variables, or aligned_alloc for heap memory.

But, most of the algorithms of PCL are written and hard-coded for floats and not doubles, so they won't work...

Refer : pcl-users link

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top