Question

I have a huge sparse matrix (1,000 x 1,000,000) that I cannot load on matlab (not enough RAM).

I want to visualize this matrix to have an idea of its sparsity and of the differences of the values.

Because of the memory constraints, I want to proceed as follows:

1- Divide the matrix into 4 matrices

2- Load each matrix on matlab and visualize it so that the colors give an idea of the values (and of the zeros particularly)

3- "Stick" the 4 images I will get in order to have a global idea for the original matrix

(i) Is it possible to load "part of a matrix" in matlab?

(ii) For the visualization tool, I read about spy (and daspect). However, this function only enables to visualize the non-zero values indifferently of their scales. Is there a way to add a color code?

(iii) How can I "stick" plots in order to make one?

Was it helpful?

Solution

If your matrix is sparse, then it seems that the currently method of storing it (as a full matrix in a text file) is very inefficient, and certainly makes loading it into MATLAB very hard. However, I suspect that as long as it is sparse enough, it can still be leaded into MATLAB as a sparse matrix.

The traditional way of doing this would be to load it all in at once, then convert to sparse representation. In your case, however, it would make sense to read in the text file, one line at a time, and convert to a MATLAB sparse matrix on-the-fly.

You can find out if this is possible by estimating the sparsity of your matrix, and using this to see if the whole thing could be loaded into MATLAB's memory as a sparse matrix.

Try something like: (untested code!)

% initialise sparse matrix
sparse_matrix = sparse(num_rows, num_cols);
row_num = 1;

fid = fopen(filename);

% read each line of text file in turn
while ~feof(fid)
    this_line = fscanf(fid, '%f');

    % add row to sparse matrix (note transpose, which I think is required)
    sparse_matrix(row_num, :) = this_line';
    row_num = row_num + 1;
end
fclose(fid)

% visualise using spy
spy(sparse_matrix)

Visualisation

With regards to visualisation: visualising a sparse matrix like this via a tool like imagesc is possible, but I believe it may internally create the full matrix – maybe someone can confirm if this is true or not. If it does, then it's going to cause you memory problems.

All spy is really doing is plotting in 2D the locations of the non-zero elements. You can fairly easily write your own spy function, which can have different coloured or sized points depending on the values at each location. See this answer for some examples.


Saving sparse matrices

As I say above, the method your matrix is saved as is pretty inefficient – for a matrix with 10% sparsity, around 95% of your text file will be a zero or a space. I don't know where this data has come from, but if you have any control over its creation (e.g. it comes from another program you have written) it would make much more sense to save only the non-zero elements in the format row_idx, col_idx, value.

You can then use spconvert to import the sparse matrix directly.

OTHER TIPS

One of the simplest methods (if you can actually store the full sparse matrix in RAM) is to use gnuplot to visualize the sparisty pattern.

I was able to spy matrices of size 10-20GB using gnuplot without problems. But make sure you use png or jpeg formats to output the image. Note that you don't need the value of the non-zero entry only the integers (row, col). And plot them "plot "row_col.dat" using 1:2 with points".

This chooses your row as x axis and cols as your y axis and start plotting the non-zero entries. It is very easy to do this. This is the most scalable solution I know. Gnuplot works at decent speed even for very large datasets (>10GB of [row, cols]), but Matlab just hangs (with due respect)

I use imagesc() to visualise arrays. It scales the values in array to values between 0 and 1, then plots the array like a greyscale bitmap image (of course you can change the colormap to make it easier to see detail).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top