Question

I have an int array[10000] and I want to iterate from a certain position to find the next non-zero index. Currently I use a basic while loop:

while(array[i] == 0){
    pos++;
}

etc

I know with intrinsics I could test 4 integers for zero at a time, but is there a way to return something indicating the vector index of the "first" non-zero?

Was it helpful?

Solution

It's fairly simple to do this, but throughput improvement may not be great, since you will probably be limited by memory bandwidth (unless your array is already cached):

int index = -1;
for (i = 0; i < n; i += 4)
{
    __m128i v = _mm_load_si128(&A[i]);
    __m128i vcmp = _mm_cmpeq_epi32(v, _mm_setzero_si128());
    int mask = _mm_movemask_epi8(vcmp);
    if (mask != 0xffff)
    {
        break;
    }
}
if (i < n)
{
    for (j = i; j < i + 4; ++j)
    {
        if (A[j] != 0)
        {
             index = j;
             break;
        }
    }
}

This assumes that the array A is 16 byte aligned, its size, n, is a multiple of 4, and that the ints are 32 bits.

Loop unrolling by a factor of 2 may help, particularly if your input data is large and/or sparse, e.g.

int index = -1;
for (i = 0; i < n; i += 8)
{
    __m128i v0 = _mm_load_si128(&A[i]);
    __m128i v1 = _mm_load_si128(&A[i + 4]);
    __m128i vcmp0 = _mm_cmpeq_epi32(v0, _mm_setzero_si128());
    __m128i vcmp1 = _mm_cmpeq_epi32(v1, _mm_setzero_si128());
    int mask0 = _mm_movemask_epi8(vcmp0);
    int mask1 = _mm_movemask_epi8(vcmp1);
    if ((mask0 | mask1) != 0xffff)
    {
        break;
    }
}
if (i < n)
{
    for (j = i; j < i + 8; ++j)
    {
        if (A[j] != 0)
        {
             index = j;
             break;
        }
    }
}

If you have AVX2 (Haswell and later) then you can process 8 ints at a time rather than 4.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top