iOS - Calculate the dot product of each row and/or column of a matrix using Accelerate.framework

https://stackoverflow.com/questions/22733059

23-06-2023
|

Pergunta

I have two matrix variables (of type float *), called matrixA and matrixB. I need to calculate the dot product for each row of matrixA and each column of matrixB. I am trying to make this as fast as possible, so I am turning to the Accelerate.framework in iOS.

I found that I can loop through each for row of matrixA and use the Accelerate.framework method vDSP_svesq() which calculates the sum of the squares of it's input vector (same things as the dot product in this case). In my case, the input vector would be each row of the matrix that I am looping through.

For matrixB I believe I can calculate the dot product of each column by using the same vDSP_svesq() function and including a stride value equal to the number of columns in the matrix.

My question is: is there any way to avoid looping thorough each row and calculating the dot poriduct on each individual row? Is there an Accelerate.framework method which calculates the dot product of a each matrix row and/or column without forcing me to do so in a loop?

The documentation for Accelerate.framework is really difficult for me to understand. I'm trying, but...

Any pointers would be greatly appreciated.

Solução

It's not totally clear what you're asking. A dot product takes two vectors as arguments, but you keep talking about "the dot product for each [vector]."

What I think you're asking for is a way to compute the dot product of each row [or column] with itself, which is the l2 norm squared of each row [or product]. The result would be a vector whose ith entry is given by:

result_i = sum_{j=0}^{j<n} A_ij * A_ij

If that's really what you're trying to compute, then calling vDSP_svesq on each row is a perfectly reasonably solution.

For computing the norm squared of the columns, however, I would suggest a different solution. If you try to do it using vDSP_svesq, as you noted, you will need to have a non-unit stride, which will pretty much ruin your performance. Instead, you can do the following:

void normsSquaredOfColumns(float *result, const float * restrict matrix,
                           int rows, int cols) {
    // initialize result with squares of the first row.
    vDSP_vsq(matrix, 1, result, 1, cols);
    // loop over rows, adding square of each to the result.
    for (int row=1; row<rows; ++row)
        vDSP_vma(&matrix[i*cols], 1, &matrix[i*cols], 1, result, 1, result, 1, cols);
}

Licenciado em: CC-BY-SA com atribuição

Não afiliado a StackOverflow