Question

I have a 2D matrix stored in a flat buffer along diagonals. For example a 4x4 matrix would have its indexes scattered like so:

 0   2   5   9
 1   4   8  12
 3   7  11  14
 6  10  13  15

With this representation, what is the most efficient way to calculate the index of a neighboring element given the original index and a X/Y offset? For example:

// return the index of a neighbor given an offset
int getNGonalNeighbor(const size_t index,
                      const int x_offset,
                      const int y_offset){
    //...
}

// for the array above:
getNGonalNeighbor(15,-1,-1); // should return 11
getNGonalNeighbor(15, 0,-1); // should return 14
getNGonalNeighbor(15,-1, 0); // should return 13
getNGonalNeighbor(11,-2,-1); // should return 1

We assume here that overflow never occurs and there is no wrap-around.

I have a solution involving a lot of triangular number and triangular root calculations. It also contains a lot of branches, which I would prefer to replace with algebra if possible (this will run on GPUs where diverging control flow is expensive). My solution is working but very lengthy. I feel like there must be a much simpler and less compute intensive way of doing it.

Maybe it would help me if someone can put a name on this particular problem/representation.

I can post my full solution if anyone is interested, but as I said it is very long and relatively complicated for such a simple task. In a nutshell, my solution does:

  • translate the original index into a larger triangular matrix to avoid dealing with 2 triangles (for example 13 would become 17)

For the 4x4 matrix this would be:

0   2   5   9   14  20  27
1   4   8   13  19  26  
3   7   12  18  25  
6   11  17  24  
10  16  23
15  22  
21  
  • calculate the index of the diagonal of the neighbor in this representation using the manhattan distance of the offset and the triangular root of the index.
  • calculate the position of the neighbor in this diagonal using the offset
  • translate back to the original representation by removing the padding.

For some reason this is the simplest solution i could come up with.

Edit:

having loop to accumulate the offset:

I realize that given the properties of the triangle numbers, it would be easier to split up the matrix in two triangles (let's call 0 to 9 'upper triangle' and 10 to 15 'lower triangle') and have a loop with a test inside to accumulate the offset by adding one while in the upper triangle and subtracting one in the lower (if that makes sense). But for my solution loops must be avoided at all cost, especially loops with unbalanced trip counts (again, very bad for GPUs).

So I am looking more for an algebraic solution rather than an algorithmic one.

Building a lookup table:

Again, because of the GPU, it is preferable to avoid building a lookup table and have random accesses in it (very expensive). An algebraic solution is preferable.

Properties of the matrix:

  • The size of the matrix is known.
  • For now I only consider square matrix, but a solution for rectangular ones as well would be nice.
  • as the name of the function in my example suggests, extending the solution to N-dimensional volumes (hence N-gonal flattening) would be a big plus too.
Was it helpful?

Solution 2

I actually already had the elements to solve it somewhere else in my code. As BLUEPIXY's solution hinted, I am using scatter/gather operations, which I had already implemented for layout transformation.

This solution basically rebuilds the original (x,y) index of the given element in the matrix, applies the index offset and translates the result back to the transformed layout. It splits the square in 2 triangles and adjust the computation depending on which triangle it belongs to.

It is an almost entirely algebraic transformation: it uses no loop and no table lookup, has a small memory footprint and little branching. The code can probably be optimized further.

Here is the draft of the code:

#include <stdio.h>
#include <math.h>

// size of the matrix
#define SIZE 4

// triangle number of X
#define TRIG(X) (((X) * ((X) + 1)) >> 1)
// triangle root of X
#define TRIROOT(X) ((int)(sqrt(8*(X)+1)-1)>>1);

// return the index of a neighbor given an offset
int getNGonalNeighbor(const size_t index,
                      const int x_offset,
                      const int y_offset){
    // compute largest upper triangle index
    const size_t upper_triangle = TRIG(SIZE);

    // position of the actual element of index
    unsigned int x = 0,y = 0;

    // adjust the index depending of upper/lower triangle.
    const size_t adjusted_index = index < upper_triangle ?
                index :
                SIZE * SIZE - index - 1;

    // compute triangular root
    const size_t triroot = TRIROOT(adjusted_index);
    const size_t trig = TRIG(triroot);
    const size_t offset = adjusted_index - trig;

    // upper triangle
    if(index < upper_triangle){
        x = offset;
        y = triroot-offset;
    }
    // lower triangle
    else {
        x = SIZE - offset - 1;
        y = SIZE - (trig + triroot + 1 - adjusted_index);
    }

    // adjust the offset
    x += x_offset;
    y += y_offset;

    // manhattan distance
    const size_t man_dist = x+y;

    // calculate index using triangular number
    return TRIG(man_dist) +
            (man_dist >= SIZE ? x - (man_dist - SIZE + 1) : x) -
            (man_dist > SIZE ? 2* TRIG(man_dist - SIZE) : 0);
}

int main(){
    printf("%d\n", getNGonalNeighbor(15,-1,-1)); // should return 11
    printf("%d\n", getNGonalNeighbor(15, 0,-1)); // should return 14
    printf("%d\n", getNGonalNeighbor(15,-1, 0)); // should return 13
    printf("%d\n", getNGonalNeighbor(11,-2,-1)); // should return 1
}

And the output is indeed:

11
14
13
1

If you think this solution looks over complicated and inefficient, I remind you that the target here is GPU, where computation costs virtually nothing compared to memory accesses, and all index computations are computed at the same time using massively parallel architectures.

OTHER TIPS

Table lookup

#include <stdio.h>

#define SIZE 16
#define SIDE  4  //sqrt(SIZE)

int table[SIZE];
int rtable[100];// {x,y| x<=99, y<=99 }

void setup(){
    int i, x, y, xy, index;//xy = x + y

    x=y=xy=0;
    for(i=0;i<SIZE;++i){
        table[i]= index= x*10 + y;
        rtable[x*10+y]=i;
        x = x + 1; y = y - 1;//right up
        if(y < 0 || x >= SIDE){
            ++xy;
            x = 0;
            y = xy;;
            while(y>=SIDE){
                ++x;
                --y;
            }
        }
    }
}
int getNGonalNeighbor(int index, int offsetX, int offsetY){
    int x,y;
    x=table[index] / 10 + offsetX;
    y=table[index] % 10 + offsetY;
    if(x < 0 || x >= SIDE || y < 0 || y >= SIDE) return -1; //ERROR
    return rtable[ x*10+y ];
}

int main() {
    int i;
    setup();
    printf("%d\n", getNGonalNeighbor(15,-1,-1));
    printf("%d\n", getNGonalNeighbor(15, 0,-1));
    printf("%d\n", getNGonalNeighbor(15,-1, 0));
    printf("%d\n", getNGonalNeighbor(11,-2,-1));
    printf("%d\n", getNGonalNeighbor(0, -1,-1));

    return 0;
}

don't use table version.

#include <stdio.h>

#define SIZE 16
#define SIDE  4

void num2xy(int index, int *offsetX, int *offsetY){
    int i, x, y, xy;//xy = x + y

    x=y=xy=0;
    for(i=0;i<SIZE;++i){
        if(i == index){
            *offsetX = x;
            *offsetY = y;
            return;
        }
        x = x + 1; y = y - 1;//right up
        if(y < 0 || x >= SIDE){
            ++xy;
            x = 0;
            y = xy;;
            while(y>=SIDE){
                ++x;
                --y;
            }
        }
    }
}
int xy2num(int offsetX, int offsetY){
    int i, x, y, xy, index;//xy = x + y

    x=y=xy=0;
    for(i=0;i<SIZE;++i){
        if(offsetX == x && offsetY == y) return i;
        x = x + 1; y = y - 1;//right up
        if(y < 0 || x >= SIDE){
            ++xy;
            x = 0;
            y = xy;;
            while(y>=SIDE){
                ++x;
                --y;
            }
        }
    }
    return -1;
}
int getNGonalNeighbor(int index, int offsetX, int offsetY){
    int x,y;

    num2xy(index, &x, &y);

    return xy2num(x + offsetX, y + offsetY);
}

int main() {
    printf("%d\n", getNGonalNeighbor(15,-1,-1));
    printf("%d\n", getNGonalNeighbor(15, 0,-1));
    printf("%d\n", getNGonalNeighbor(15,-1, 0));
    printf("%d\n", getNGonalNeighbor(11,-2,-1));
    printf("%d\n", getNGonalNeighbor(0, -1,-1));

    return 0;
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top