문제

this is my third post and attempt to solve this problem, which first showed up using numpy.dot(A, A.T) where A is large, 150,000 x 265 elements.

With numpy, I got back an array with many missing values, that were just zeros. I've tried to call BLAS thru CBLAS. I'm getting a segmentation fault error with large arrays.

I'm running this on a machine with about 250 GB free memory. Thanks for reading...

#include <stdio.h>              /* I/O lib         ISOC  */
#include <stdlib.h>             /* Standard Lib    ISOC  */
#include <cblas.h>              /* C BLAS          BLAS  */
#include "blaio.h"

int main(int argc, char **argv) {

int row = 100000;
int col = 265;

float *a, *b, *c;

a = (float *) malloc(row * col * sizeof(float));
b = (float *) malloc(row * col * sizeof(float));
c = (float *) malloc(row * row * sizeof(float));

int i, end;
end = row * col;
for(i=0; i<end; i++)
{
    a[i] = 1.0;
    b[i] = 1.0; 
}

for(i=0;  i<(row*row); i++)
    c[i] = 2.0;

//          row_order      transform     transform     rowsA colsB  K   alpha  a  lda  b  ldb  beta  c   ldc
cblas_sgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, row,  row,  col, 1.0f,  a, col, b, row, 0.0f, c,  row);


int num_bad = 0;
for(i=0; i<(row*row); i++)
{
    if (c[i] != col)
        {
        printf("Bad value found: %f,  at index: %i\n", c[i], i );
        num_bad += 1;
        }
}

printf("Number of bad values found: %i \n\n", num_bad);


//printMatrix(CblasRowMajor, row, row, c, 8, 3, NULL, NULL, NULL, NULL, NULL, "c = ");

return 0;
} /* end func main */

UPDATE: Ray has expertly noticed that the blas I'm using via cblas, must be 32 bit and not able to access the array indices. Therefore, I've installed blas64.x86_64 and blas64-devel.x86_64.

Then, rewrote a few lines of the code above to use the direct call to sgemm without cblas.

#include <stdio.h>              /* I/O lib         ISOC  */
#include <stdlib.h>             /* Standard Lib    ISOC  */

int main(int argc, char **argv) {

int row = 100000;
int col = 265;

float *a, *b, *c;

a = (float *) malloc(row * col * sizeof(float));
b = (float *) malloc(row * col * sizeof(float));
c = (float *) malloc(row * row * sizeof(float));

int i, end;
end = row * col;

for(i=0; i<end; i++)
    {
    a[i] = 1.0;
    b[i] = 1.0; 
    }

for(i=0;  i<(row*row); i++)
    c[i] = 2.0;

float alpha = 1.0, beta = 1.0;
sgemm_('N','N', &row,  &row,  &col, &alpha,  &a[0], &col, &b[0], &row, &beta, &c[0],  &row);

I compiled with:

gcc sgemm_test_fortran.c -o test  -L /usr/lib64 -lblas64

The code compiled and I think it might run.. :)

도움이 되었습니까?

해결책

The problem is that the size of your output matrix (100,000x100,000 = 1e10 elements) can't be stored in an int (2.14e9). You can fix this in your C++ code by switching the types to size_t, but you're going to run into the same problem inside the BLAS library.

What you need to to do is use a BLAS library that is compiled to use 8-byte integers; most BLAS libraries are compiled with 4-byte integers. You don't mention what BLAS library you're linking to, so it's hard to guess what the correct library name is (if it even exists) on your system.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top