What is the best way to store a 16 × (2^20) matrix in MATLAB?

https://stackoverflow.com/questions/2899772

04-10-2019
|

Question

I am thinking of writing the data to a file. Does anyone have an example of how to write a big amount of data to a file?

Edit: Most elements in the matrix are zeroes, others are uint32. I guess the simplest save() and load() would work, as @Jonas suggested.

Solution

I guess nobody's seen the edit about the zeroes :)

If they're mostly zeroes, you should convert your matrix to its sparse representation and then save it. You can do that with the sparse function.

Code

z = zeros(10000,10000);
z(123,456) = 1;
whos z
z = sparse(z);
whos z

Output

Name          Size                   Bytes  Class     Attributes

  z         10000x10000            800000000  double  

Name          Size               Bytes  Class     Attributes

  z         10000x10000            40016  double    sparse

I don't think the sparse implementation is designed to handle uint32.

OTHER TIPS

If you're concerned with keeping the size of the data file as small as possible, here are some suggestions:

Write the data to a binary file (i.e. using FWRITE) instead of to a text file (i.e. using FPRINTF).
If your data contains all integer values, convert it to or save it as a signed or unsigned integer type instead of the default double precision type MATLAB uses.
If your data contains floating point values, but you don't need the range or resolution of the default double precision type, convert it to or save it as a single precision type.
If your data is sufficiently sparse (i.e. there are many more zeroes than non-zeroes in your matrix), then you can use the FIND function to get the row and column indices of the non-zero values, then just save these to your file.

Here are a couple of examples to illustrate:

data = double(rand(16,2^20) <= 0.00001);  %# A large but very sparse matrix

%# Writing the values as type double:
fid = fopen('data_double.dat','w');  %# Open the file
fwrite(fid,size(data),'uint32');     %# Write the matrix size (2 values)
fwrite(fid,data,'double');           %# Write the data as type double
fclose(fid);                         %# Close the file

%# Writing the values as type uint8:
fid = fopen('data_uint8.dat','w');  %# Open the file
fwrite(fid,size(data),'uint32');    %# Write the matrix size (2 values)
fwrite(fid,data,'uint8');           %# Write the data as type uint8
fclose(fid);                        %# Close the file

%# Writing out only the non-zero values:
[rowIndex,columnIndex,values] = find(data);  %# Get the row and column indices
                                             %#   and the non-zero values
fid = fopen('data_sparse.dat','w');  %# Open the file
fwrite(fid,numel(values),'uint32');  %# Write the length of the vectors (1 value)
fwrite(fid,rowIndex,'uint32');       %# Write the row indices
fwrite(fid,columnIndex,'uint32');    %# Write the column indices
fwrite(fid,values,'uint8');          %# Write the non-zero values
fclose(fid);                         %# Close the file

The files created above will differ drastically in size. The file 'data_double.dat' will be about 131,073 KB, 'data_uint8.dat' will be about 16,385 KB, and 'data_sparse.dat' will be less than 2 KB.

Note that I also wrote the data\vector sizes to the files so that the data can be read back in (using FREAD) and reshaped properly. Note also that if I did not supply a 'double' or 'uint8' argument to FWRITE, MATLAB would be smart enough to figure out that it didn't need to use the default double precision and would only use 8 bits to write out the data values (since they are all 0 and 1).

How is the data generated? How do you need to access the data?

If I calculate correctly, the variable is less than 200MB if it's all double. Thus, you can easily save and load it as a single .mat file if you need to access it from Matlab only.

%# create data
data = zeros(16,2^20);

%# save data
save('myFile.mat','data');

%# clear data to test everything works
clear data

%# load data
load('myFile.mat')

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow