Question

I have a data file with varying amount of data per line that I would like to load into Matlab as an array. As an example, suppose the data file looks like

1 2
3 4 5 6
7
8 9 10

I want to read it into Matlab as an array that looks like

1  2  nan nan
3  4   5   6
7 nan nan nan
8  9  10  nan

I can do this by doing a for loop over all lines of the file but my files are very large and I am looking for an efficient solution. Any ideas would be highly appreciated. If it helps, I also know an upper bound on the maximum line length across the file.

Was it helpful?

Solution

While Divakar's answer works if you don't have any values of zero in your text file, that's may not generally be the case. For example, if your text array was

1 2 3
4 0
5 6 0 7 8

then Divakar's result would be:

1 2 3 nan nan
4 nan nan nan nan
5 6 nan 7 8

whereas you really want:

1 2 3 nan nan
4 0 nan nan nan
5 6 0 7 8

The easiest way to implement this is to open up the dlmread function (just type dlmread into the text editor and press Ctrl+D to open it up). Make sure to save this file as a separate file in the directory you're working in with a different name (i.e. dlmread_nan.m).

Go down to this part of the code (line 126 in my version):

if isempty(delimiter)
    result  = textscan(fid,'',nrows,'headerlines',r,'headercolumns',c,...
                       'returnonerror',0,'emptyvalue',0, 'CollectOutput', true);
else
    delimiter = sprintf(delimiter);
    whitespace  = setdiff(sprintf(' \b\t'),delimiter);
    result  = textscan(fid,'',nrows,...
                   'delimiter',delimiter,'whitespace',whitespace, ...
                   'headerlines',r,'headercolumns',c,...
                   'returnonerror',0,'emptyvalue',0,'CollectOutput', true);
end

and change the value after 'emptyvalue' in both cases to NaN instead of 0. Save the file. It should look like this:

if isempty(delimiter)
    result  = textscan(fid,'',nrows,'headerlines',r,'headercolumns',c,...
                       'returnonerror',0,'emptyvalue',NaN, 'CollectOutput', true);
else
    delimiter = sprintf(delimiter);
    whitespace  = setdiff(sprintf(' \b\t'),delimiter);
    result  = textscan(fid,'',nrows,...
                   'delimiter',delimiter,'whitespace',whitespace, ...
                   'headerlines',r,'headercolumns',c,...
                   'returnonerror',0,'emptyvalue',NaN,'CollectOutput', true);
end

To get your array, use this:

result = dlmread_nan('text.txt', ' '); 
%%//This will give you exactly what you're looking for.

It's a little cumbersome, but by copying from MATLAB's library, it'll probably be a lot more robust and error-free than writing it from scratch yourself.

OTHER TIPS

Case 1: Data with no zeros

This technique uses dlmread that inherently converts the empty places to zeros and then we need to convert those to NaNs as needed to obtain the desired output.

Code

out = dlmread(textfile_path, ' ') %%// textfile_path is path to your text file
out(out==0)=NaN %%// out is your desired output

Input

1 2
3 4 5 6
7
8 9 10

Output

out =
     1     2   NaN   NaN
     3     4     5     6
     7   NaN   NaN   NaN
     8     9    10   NaN

Case 2: Data with zeros and thus warrantes extra care to preserve those zeros

This approach revolves around importing data using textscan to column arrays of cells that preserves zeros as they are and makes the empty places as NaNs. The only issue is that for the final line, because of the absence of any empty places (if the last line isn't the longest one), the length isn't equal to the number of lines. For the same, we need few extra lines of codes.

Approach 1:

Code

fid = fopen(textfile_path,'r');  %%// textfile_path is path to your text file
data1 = textscan(fid,'');
fclose(fid);

lens = cellfun(@numel,data1)
out = NaN(max(lens),numel(lens))  %%// out will be your output
for k = 1:numel(lens)
    out(1:lens(k),k) = data1{:,k}
end

Approach 2 (Shorter version):

Code

fid = fopen(textfile_path,'r');  %%// textfile_path is path to your text file
data1 = textscan(fid,'');
fclose(fid);

n1 = find(diff(cellfun(@numel,data1))~=0)

%%// out will be your output
out = [horzcat(data1{:,[1 n1]}) [horzcat(data1{:,[n1+1:end]}) ; NaN(1,numel(data1)-n1)]]

Input

1 2 3
4 0
5 6 0 7 8
0 0

Output

out =
     1     2     3   NaN   NaN
     4     0   NaN   NaN   NaN
     5     6     0     7     8
     0     0   NaN   NaN   NaN

In This case?

Input

1 2 3   6
4   5
1   0 7 8
0 0     5

How to do?

Textscan will be dangerous because it will not preserve the number position

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top