Question

I have an int array[10000] and I want to iterate from a certain position to find the next non-zero index. Currently I use a basic while loop:

while(array[i] == 0){
    pos++;
}

etc

I know with intrinsics I could test 4 integers for zero at a time, but is there a way to return something indicating the vector index of the "first" non-zero?

Était-ce utile?

La solution

It's fairly simple to do this, but throughput improvement may not be great, since you will probably be limited by memory bandwidth (unless your array is already cached):

int index = -1;
for (i = 0; i < n; i += 4)
{
    __m128i v = _mm_load_si128(&A[i]);
    __m128i vcmp = _mm_cmpeq_epi32(v, _mm_setzero_si128());
    int mask = _mm_movemask_epi8(vcmp);
    if (mask != 0xffff)
    {
        break;
    }
}
if (i < n)
{
    for (j = i; j < i + 4; ++j)
    {
        if (A[j] != 0)
        {
             index = j;
             break;
        }
    }
}

This assumes that the array A is 16 byte aligned, its size, n, is a multiple of 4, and that the ints are 32 bits.

Loop unrolling by a factor of 2 may help, particularly if your input data is large and/or sparse, e.g.

int index = -1;
for (i = 0; i < n; i += 8)
{
    __m128i v0 = _mm_load_si128(&A[i]);
    __m128i v1 = _mm_load_si128(&A[i + 4]);
    __m128i vcmp0 = _mm_cmpeq_epi32(v0, _mm_setzero_si128());
    __m128i vcmp1 = _mm_cmpeq_epi32(v1, _mm_setzero_si128());
    int mask0 = _mm_movemask_epi8(vcmp0);
    int mask1 = _mm_movemask_epi8(vcmp1);
    if ((mask0 | mask1) != 0xffff)
    {
        break;
    }
}
if (i < n)
{
    for (j = i; j < i + 8; ++j)
    {
        if (A[j] != 0)
        {
             index = j;
             break;
        }
    }
}

If you have AVX2 (Haswell and later) then you can process 8 ints at a time rather than 4.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top