Question

Does anyone here know how to delete a variable from a matlab file? I know that you can add variables to an existing matlab file using the save -append method, but there's no documentation on how to delete variables from the file.

Before someone says, "just save it", its because I'm saving intermediate processing steps to disk to alleviate memory problems, and in the end there will be almost 10 GB of intermediate data per analysis routine. Thanks!

Was it helpful?

Solution

Interestingly enough, you can use the -append option with SAVE to effectively erase data from a .mat file. Note this excerpt from the documentation (bold added by me):

For MAT-files, -append adds new variables to the file or replaces the saved values of existing variables with values in the workspace.

In other words, if a variable in your .mat file is called A, you can save over that variable with a new copy of A (that you've set to []) using the -append option. There will still be a variable called A in the .mat file, but it will be empty and thus reduce the total file size.

Here's an example:

>> A = rand(1000);            %# Create a 1000-by-1000 matrix of random values
>> save('savetest.mat','A');  %# Save A to a file
>> whos -file savetest.mat    %# Look at the .mat file contents
  Name         Size                Bytes  Class     Attributes

  A         1000x1000            8000000  double

The file size will be about 7.21 MB. Now do this:

>> A = [];                              %# Set the variable A to empty
>> save('savetest.mat','A','-append');  %# Overwrite A in the file
>> whos -file savetest.mat              %# Look at the .mat file contents
  Name      Size            Bytes  Class     Attributes

  A         0x0                 0  double

And now the file size will be around 169 bytes. The variable is still in there, but it is empty.

OTHER TIPS

10 GB of data? Updating multi-variable MAT files could get expensive due to MAT format overhead. Consider splitting the data up and saving each variable to a different MAT file, using directories for organization if necessary. Even if you had a convenient function to delete variables from a MAT file, it would be inefficient. The variables in a MAT file are layed out contiguously, so replacing one variable can require reading and writing much of the rest. If they're in separate files, you can just delete the whole file, which is fast.

To see this in action, try this code, stepping through it in the debugger while using something like Process Explorer (on Windows) to monitor its I/O activity.

function replace_vars_in_matfile

x = 1;
% Random dummy data; zeros would compress really well and throw off results
y = randi(intmax('uint8')-1, 100*(2^20), 1, 'uint8');

tic; save test.mat x y; toc;
x = 2;
tic; save -append test.mat x; toc;
y = y + 1;
tic; save -append test.mat y; toc;

On my machine, the results look like this. (Read and Write are cumulative, Time is per operation.)

                    Read (MB)      Write (MB)       Time (sec)
before any write:   25             0
first write:        25             105              3.7
append x:           235            315              3.6
append y:           235            420              3.8

Notice that updating the small x variable is more expensive than updating the large y. Much of this I/O activity is "redundant" housekeeping work to keep the MAT file format organized, and will go away if each variable is in its own file.

Also, try to keep these files on the local filesystem; it'll be a lot faster than network drives. If they need to go on a network drive, consider doing the save() and load() on local temp files (maybe chosen with tempname()) and then copying them to/from the network drive. Matlab's save and load tend to be much faster with local filesystems, enough so that local save/load plus a copy can be a substantial net win.


Here's a basic implementation that will let you save variables to separate files using the familiar save() and load() signatures. They're prefixed with "d" to indicate they're the directory-based versions. They use some tricks with evalin() and assignin(), so I thought it would be worth posting the full code.

function dsave(file, varargin)
%DSAVE Like save, but each var in its own file
%
% dsave filename var1 var2 var3...
if nargin < 1 || isempty(file); file = 'matlab';  end
[tfStruct,loc] = ismember({'-struct'}, varargin);
args = varargin;
args(loc(tfStruct)) = [];
if ~all(cellfun(@isvarname, args))
    error('Invalid arguments. Usage: dsave filename <-struct> var1 var2 var3 ...');
end
if tfStruct
    structVarName = args{1};
    s = evalin('caller', structVarName);
else
    varNames = args;
    if isempty(args)
        w = evalin('caller','whos');
        varNames = { w.name };
    end
    captureExpr = ['struct(' ...
        join(',', cellfun(@(x){sprintf('''%s'',{%s}',x,x)}, varNames)) ')'];
    s = evalin('caller', captureExpr);
end

% Use Java checks to avoid partial path ambiguity
jFile = java.io.File(file);
if ~jFile.exists()
    ok = mkdir(file);
    if ~ok; 
        error('failed creating dsave dir %s', file);
    end
elseif ~jFile.isDirectory()
    error('Cannot save: destination exists but is not a dir: %s', file);
end
names = fieldnames(s);
for i = 1:numel(names)
    varFile = fullfile(file, [names{i} '.mat']);
    varStruct = struct(names{i}, {s.(names{i})});
    save(varFile, '-struct', 'varStruct');
end

function out = join(Glue, Strings)
Strings = cellstr(Strings);
if length( Strings ) == 0
    out = '';
elseif length( Strings ) == 1
    out = Strings{1};
else
    Glue = sprintf( Glue ); % Support escape sequences
    out = strcat( Strings(1:end-1), { Glue } );
    out = [ out{:} Strings{end} ];
end

Here's the load() equivalent.

function out = dload(file,varargin)
%DLOAD Like load, but each var in its own file
if nargin < 1 || isempty(file); file = 'matlab'; end
varNames = varargin;
if ~exist(file, 'dir')
    error('Not a dsave dir: %s', file);
end
if isempty(varNames)
    d = dir(file);
    varNames = regexprep(setdiff(ls(file), {'.','..'}), '\.mat$', '');
end

out = struct;
for i = 1:numel(varNames)
    name = varNames{i};
    tmp = load(fullfile(file, [name '.mat']));
    out.(name) = tmp.(name);
end

if nargout == 0
    for i = 1:numel(varNames)
        assignin('caller', varNames{i}, out.(varNames{i}));
    end
    clear out
end

Dwhos() is the equivalent of whos('-file').

function out = dwhos(file)
%DWHOS List variable names in a dsave dir
if nargin < 1 || isempty(file); file = 'matlab'; end
out = regexprep(setdiff(ls(file), {'.','..'}), '\.mat$', '');

And ddelete() to delete the individual variables like you asked.

function ddelete(file,varargin)
%DDELETE Delete variables from a dsave dir
if nargin < 1 || isempty(file); file = 'matlab'; end
varNames = varargin;
for i = 1:numel(varNames)
    delete(fullfile(file, [varNames{i} '.mat']));
end

The only way of doing this that I know is to use the MAT-file API function matDeleteVariable. It would, I guess, be quite easy to write a Fortran or C routine to do this, but it does seem like a lot of effort for something that ought to be much easier.

I suggest you load the variables from the .mat file you want to keep, and save them to a new .mat file. If necessary, you can load and save (using '-append') in a loop.

S = load(filename, '-mat', variablesYouWantToKeep);
save(newFilename,'-struct',S,variablesYouWantToKeep);
%# then you can delete the old file
delete(filename)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top