在MATLAB中存储16×（2^20）矩阵的最佳方法是什么？

https://stackoverflow.com/questions/2899772

04-10-2019
|

题

我正在考虑将数据写入文件。有人有一个示例说明如何将大量数据写入文件吗？

编辑： 矩阵中的大多数元素是零，其他元素是 uint32. 。我想最简单的 save() 和 load() 正如@jonas所建议的那样，会起作用。

解决方案

我想没有人看到关于零的编辑:)

如果它们大部分为零，则应将矩阵转换为稀疏表示形式，并且然后保存。你可以用疏功能。

代码

z = zeros(10000,10000);
z(123,456) = 1;
whos z
z = sparse(z);
whos z

输出

Name          Size                   Bytes  Class     Attributes

  z         10000x10000            800000000  double  

Name          Size               Bytes  Class     Attributes

  z         10000x10000            40016  double    sparse

我认为稀疏实施旨在处理 uint32.

其他提示

如果您关心将数据文件的大小保持尽可能小，则需要以下一些建议：

将数据写入二进制文件（即使用 fwrite）而不是到文本文件（即使用 fprintf).
如果您的数据包含所有整数值，请将其转换为或将其保存为签名或未签名的整数类型而不是默认双精度类型 MATLAB使用。
如果您的数据包含浮点值，但是您不需要默认值的范围或分辨率双精度类型, ，将其转换为或将其保存为单精度类型.
如果您的数据足够稀疏（即比矩阵中的非零人数要多得多），则可以使用寻找函数以获取非零值的行和列索引，然后将它们保存到您的文件中。

这里有几个示例要说明：

data = double(rand(16,2^20) <= 0.00001);  %# A large but very sparse matrix

%# Writing the values as type double:
fid = fopen('data_double.dat','w');  %# Open the file
fwrite(fid,size(data),'uint32');     %# Write the matrix size (2 values)
fwrite(fid,data,'double');           %# Write the data as type double
fclose(fid);                         %# Close the file

%# Writing the values as type uint8:
fid = fopen('data_uint8.dat','w');  %# Open the file
fwrite(fid,size(data),'uint32');    %# Write the matrix size (2 values)
fwrite(fid,data,'uint8');           %# Write the data as type uint8
fclose(fid);                        %# Close the file

%# Writing out only the non-zero values:
[rowIndex,columnIndex,values] = find(data);  %# Get the row and column indices
                                             %#   and the non-zero values
fid = fopen('data_sparse.dat','w');  %# Open the file
fwrite(fid,numel(values),'uint32');  %# Write the length of the vectors (1 value)
fwrite(fid,rowIndex,'uint32');       %# Write the row indices
fwrite(fid,columnIndex,'uint32');    %# Write the column indices
fwrite(fid,values,'uint8');          %# Write the non-zero values
fclose(fid);                         %# Close the file

上面创建的文件的大小将大不相同。文件 'data_double.dat' 大约为131,073 kb， 'data_uint8.dat' 将大约16,385 kb，并且 'data_sparse.dat' 将小于2 kb。

请注意，我还将数据向量大小写入文件，以便可以重新读取数据（使用弗雷德）并正确重塑。还请注意，如果我不提供 'double' 或者 'uint8' 争论 fwrite, ，MATLAB足够聪明，可以弄清楚它不需要使用默认的双重精度，并且只能使用8位写出数据值（因为它们都是0和1）。

数据是如何生成的？您需要如何访问数据？

如果我正确计算，则该变量的数量小于200MB，如果它们全部两倍。因此，如果您只需要从MATLAB访问它，则可以轻松地将其保存为单个.mat文件。

%# create data
data = zeros(16,2^20);

%# save data
save('myFile.mat','data');

%# clear data to test everything works
clear data

%# load data
load('myFile.mat')

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow