Question

I have a boolean expression that I have managed to implement in SSE2. Now I would have liked to try implementing it in AVX exploiting an additional factor 2 in parallelism increase (from 128 bit SIMD type to 256). However, AVX does not support integer operation (which AVX2 does, but I am working on a Sandy Bridge processor so it is not an option currently). However, since there are AVX intrinsics for bitwise operations. I figured I could make a try by just converting my integer types to float types and see if it works.

First test was a success:

__m256 ones = _mm256_set_ps(1,1,1,1,1,1,1,1);
__m256 twos = _mm256_set_ps(2,2,2,2,2,2,2,2); 
__m256 result = _mm256_and_ps(ones, twos);

I'm guetting all 0's as I am supposed to. Simularly AND'ing the twos instead I get a result of 2. But when trying 11 XOR 4 accordingly:

__m256 elevens = _mm256_set_ps(11,11,11,11,11,11,11,11); 
__m256 fours = _mm256_set_ps(4,4,4,4,4,4,4,4); 
__m256 result2 = _mm256_xor_ps(elevens, fours); 

The result is 6.46e-46 (i.e. close to 0) and not 15. Simularly doing 11 OR 4 gives me a value of 22 and not 15 as it should be. I don't understand why this is. Is it a bug or some configuration I am missing?

I was actually expecting my hypothesis of working with float as if they were integers to not work since the integer initialized to a float value might not actually be the precise value but a close approximation. But even then, I am surprised by the result I get.

Does anyone have a solution to this problem or must I upgrade my CPU to get AVX2 support enable this?

Était-ce utile?

La solution

The first test worked by accident.

1 as a float is 0x3f800000, 2 is 0x40000000. In general, it wouldn't work that way.

But you can absolutely do it, you just have to make sure that you're working with the right bit-pattern. Don't convert your integers to floats - reinterpret-cast them. That corresponds to intrinsics such as _mm256_castsi256_ps, or storing your ints to memory and reading them as floats (that won't change them, in general only math operations care about what the floats mean, the rest work with the raw bit patterns, check the list of exceptions that an instruction can make to make sure).

Autres conseils

You don't need AVX2 to use the AVX integer load and store operations: see the intel intrinsic guide. So you can load your integers using AVX, reinterpret-cast to float, use float bitwise operations, and then reinterpret-cast back to int. The reinterpret-casts don't generate any instructions, they just make the compiler happy. Try this:

//compiled and ran on an Ivy Bridge system with AVX but without AVX2
#include <stdio.h>
#include <immintrin.h>
int main() {
    int a[8] = {0, 2, 4, 6, 8, 10, 12, 14};
    int b[8] = {1, 1, 1, 1, 1,  1,  1,  1};
    int c[8];

    __m256i a8 = _mm256_loadu_si256((__m256i*)a);
    __m256i b8 = _mm256_loadu_si256((__m256i*)b);
    __m256i c8 = _mm256_castps_si256(
        _mm256_or_ps(_mm256_castsi256_ps(a8), _mm256_castsi256_ps(b8)));
    _mm256_storeu_si256((__m256i*)c, c8);
    for(int i=0; i<8; i++) printf("%d ", c[i]); printf("\n");
    //output: 1 3 5 7 9 11 13 15
}

Of course, as Mystical pointed out this might not be worth doing but that does not mean you can't do it.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top