Question

this is my third post and attempt to solve this problem, which first showed up using numpy.dot(A, A.T) where A is large, 150,000 x 265 elements.

With numpy, I got back an array with many missing values, that were just zeros. I've tried to call BLAS thru CBLAS. I'm getting a segmentation fault error with large arrays.

I'm running this on a machine with about 250 GB free memory. Thanks for reading...

#include <stdio.h>              /* I/O lib         ISOC  */
#include <stdlib.h>             /* Standard Lib    ISOC  */
#include <cblas.h>              /* C BLAS          BLAS  */
#include "blaio.h"

int main(int argc, char **argv) {

int row = 100000;
int col = 265;

float *a, *b, *c;

a = (float *) malloc(row * col * sizeof(float));
b = (float *) malloc(row * col * sizeof(float));
c = (float *) malloc(row * row * sizeof(float));

int i, end;
end = row * col;
for(i=0; i<end; i++)
{
    a[i] = 1.0;
    b[i] = 1.0; 
}

for(i=0;  i<(row*row); i++)
    c[i] = 2.0;

//          row_order      transform     transform     rowsA colsB  K   alpha  a  lda  b  ldb  beta  c   ldc
cblas_sgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, row,  row,  col, 1.0f,  a, col, b, row, 0.0f, c,  row);


int num_bad = 0;
for(i=0; i<(row*row); i++)
{
    if (c[i] != col)
        {
        printf("Bad value found: %f,  at index: %i\n", c[i], i );
        num_bad += 1;
        }
}

printf("Number of bad values found: %i \n\n", num_bad);


//printMatrix(CblasRowMajor, row, row, c, 8, 3, NULL, NULL, NULL, NULL, NULL, "c = ");

return 0;
} /* end func main */

UPDATE: Ray has expertly noticed that the blas I'm using via cblas, must be 32 bit and not able to access the array indices. Therefore, I've installed blas64.x86_64 and blas64-devel.x86_64.

Then, rewrote a few lines of the code above to use the direct call to sgemm without cblas.

#include <stdio.h>              /* I/O lib         ISOC  */
#include <stdlib.h>             /* Standard Lib    ISOC  */

int main(int argc, char **argv) {

int row = 100000;
int col = 265;

float *a, *b, *c;

a = (float *) malloc(row * col * sizeof(float));
b = (float *) malloc(row * col * sizeof(float));
c = (float *) malloc(row * row * sizeof(float));

int i, end;
end = row * col;

for(i=0; i<end; i++)
    {
    a[i] = 1.0;
    b[i] = 1.0; 
    }

for(i=0;  i<(row*row); i++)
    c[i] = 2.0;

float alpha = 1.0, beta = 1.0;
sgemm_('N','N', &row,  &row,  &col, &alpha,  &a[0], &col, &b[0], &row, &beta, &c[0],  &row);

I compiled with:

gcc sgemm_test_fortran.c -o test  -L /usr/lib64 -lblas64

The code compiled and I think it might run.. :)

Was it helpful?

Solution

The problem is that the size of your output matrix (100,000x100,000 = 1e10 elements) can't be stored in an int (2.14e9). You can fix this in your C++ code by switching the types to size_t, but you're going to run into the same problem inside the BLAS library.

What you need to to do is use a BLAS library that is compiled to use 8-byte integers; most BLAS libraries are compiled with 4-byte integers. You don't mention what BLAS library you're linking to, so it's hard to guess what the correct library name is (if it even exists) on your system.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top