Question

I am new to optimizing code with SSE/SSE2 instructions and until now I have not gotten very far. To my knowledge a common SSE-optimized function would look like this:

void sse_func(const float* const ptr, int len){
    if( ptr is aligned )
    {
        for( ... ){
            // unroll loop by 4 or 2 elements
        }
        for( ....){
            // handle the rest
            // (non-optimized code)
        }
    } else {
        for( ....){
            // regular C code to handle non-aligned memory
        }
    }
}

However, how do I correctly determine if the memory ptr points to is aligned by e.g. 16 Bytes? I think I have to include the regular C code path for non-aligned memory as I cannot make sure that every memory passed to this function will be aligned. And using the intrinsics to load data from unaligned memory into the SSE registers seems to be horrible slow (Even slower than regular C code).

Thank you in advance...

Was it helpful?

Solution

EDIT: casting to long is a cheap way to protect oneself against the most likely possibility of int and pointers being different sizes nowadays.

As pointed out in the comments below, there are better solutions if you are willing to include a header...

A pointer p is aligned on a 16-byte boundary iff ((unsigned long)p & 15) == 0.

OTHER TIPS

#define is_aligned(POINTER, BYTE_COUNT) \
    (((uintptr_t)(const void *)(POINTER)) % (BYTE_COUNT) == 0)

The cast to void * (or, equivalenty, char *) is necessary because the standard only guarantees an invertible conversion to uintptr_t for void *.

If you want type safety, consider using an inline function:

static inline _Bool is_aligned(const void *restrict pointer, size_t byte_count)
{ return (uintptr_t)pointer % byte_count == 0; }

and hope for compiler optimizations if byte_count is a compile-time constant.

Why do we need to convert to void * ?

The C language allows different representations for different pointer types, eg you could have a 64-bit void * type (the whole address space) and a 32-bit foo * type (a segment).

The conversion foo * -> void * might involve an actual computation, eg adding an offset. The standard also leaves it up to the implementation what happens when converting (arbitrary) pointers to integers, but I suspect that it is often implemented as a noop.

For such an implementation, foo * -> uintptr_t -> foo * would work, but foo * -> uintptr_t -> void * and void * -> uintptr_t -> foo * wouldn't. The alignment computation would also not work reliably because you only check alignment relative to the segment offset, which might or might not be what you want.

In conclusion: Always use void * to get implementation-independant behaviour.

Other answers suggest an AND operation with low bits set, and comparing to zero.

But a more straight-forward test would be to do a MOD with the desired alignment value, and compare to zero.

#define ALIGNMENT_VALUE     16u

if (((uintptr_t)ptr % ALIGNMENT_VALUE) == 0)
{
    // ptr is aligned
}

With a function template like

#include <type_traits>

template< typename T >
bool is_aligned(T* p){
    return !(reinterpret_cast<uintptr_t>(p) % std::alignment_of<T>::value);
}

you could check alignment at runtime by invoking something like

struct foo_type{ int bar; }foo;
assert(is_aligned(&foo)); // passes

To check that bad alignments fail, you could do

// would almost certainly fail
assert(is_aligned((foo_type*)(1 + (uintptr_t)(&foo)));

This is basically what I'm using. By making the integer a template, I ensure it's expanded compile time, so I won't end up with a slow modulo operation whatever I do.

I always like checking my input, so hence the compile time assertion. If your alignment value is wrong, well then it won't compile...

template <unsigned int alignment>
struct IsAligned
{
    static_assert((alignment & (alignment - 1)) == 0, "Alignment must be a power of 2");

    static inline bool Value(const void * ptr)
    {
        return (((uintptr_t)ptr) & (alignment - 1)) == 0;
    }
};

To see what's going on, you can use this:

// 1 of them is aligned...
int* ptr = new int[8];
for (int i = 0; i < 8; ++i)
    std::cout << IsAligned<32>::Value(ptr + i) << std::endl;

// Should give '1'
int* ptr2 = (int*)_aligned_malloc(32, 32);
std::cout << IsAligned<32>::Value(ptr2) << std::endl;

Can you just 'and' the ptr with 0x03 (aligned on 4s), 0x07 (aligned on 8s) or 0x0f (aligned on 16s) to see if any of the lowest bits are set?

Leave that to the professionals,

https://www.boost.org/doc/libs/1_65_1/doc/html/align/reference.html#align.reference.functions.is_aligned

bool is_aligned(const void* ptr, std::size_t alignment) noexcept; 

example:

        char D[1];
        assert( boost::alignment::is_aligned(&D[0], alignof(double)) ); //  might fail, sometimes

How about:

void *mem = malloc(1024+15); 
void *ptr =( (*(char*)mem) - (*(char *)mem % 16) );
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top