Question

I've written a script that saves its output to a CSV file for later reference, but the second script for importing the data takes an ungainly amount of time to read it back in.

The data is in the following format:

Item1,val1,val2,val3
Item2,val4,val5,val6,val7
Item3,val8,val9

where the headers are on the left-most column, and the data values take up the remainder of the row. One major difficulty is that the arrays of data values can be different lengths for each test item. I'd save it as a structure, but I need to be able to edit it outside the MATLAB environment, since sometimes I have to delete rows of bad data on a computer that doesn't have MATLAB installed. So really, part one of my question is: Should I save the data in a different format?

Second part of the question: I've tried importdata, csvread, and dlmread, but I'm not sure which is best, or if there's a better solution. Right now I'm using my own script using a loop and fgetl, which is horribly slow for large files. Any suggestions?

function [data,headers]=csvreader(filename); %V1_1
 fid=fopen(filename,'r');
 data={};
 headers={};
 count=1;
 while 1
      textline=fgetl(fid);
      if ~ischar(textline),   break,   end
      nextchar=textline(1);
      idx=1;
      while nextchar~=','
        headers{count}(idx)=textline(1);
        idx=idx+1;
        textline(1)=[];
        nextchar=textline(1);
      end
      textline(1)=[];
      data{count}=str2num(textline);
      count=count+1;
 end
 fclose(fid);

(I know this is probably terribly written code - I'm an engineer, not a programmer, please don't yell at me - any suggestions for improvement would be welcome, though.)

Was it helpful?

Solution

It would probably make the data easier to read if you could pad the file with NaN values when your first script creates it:

Item1,1,2,3,NaN
Item2,4,5,6,7
Item3,8,9,NaN,NaN

or you could even just print empty fields:

Item1,1,2,3,
Item2,4,5,6,7
Item3,8,9,,

Of course, in order to pad properly you would need to know what the maximum number of values across all the items is before hand. With either format above, you could then use one of the standard file reading functions, like TEXTSCAN for example:

>> fid = fopen('uneven_data.txt','rt');
>> C = textscan(fid,'%s %f %f %f %f','Delimiter',',','CollectOutput',1);
>> fclose(fid);
>> C{1}

ans = 

    'Item1'
    'Item2'
    'Item3'

>> C{2}

ans =

     1     2     3   NaN  %# TEXTSCAN sets empty fields to NaN anyway
     4     5     6     7
     8     9   NaN   NaN

OTHER TIPS

Instead of parsing the string textline one character at a time. You could use strtok to break the string up for example

stringParts = {};
tline = fgetl(fid);
if ~ischar(tline), break, end
i=1;
while 1
    [stringParts{i},r]=strtok(tline,',');
    tline=r;
    i=i+1;
    if isempty(r), break; end
end

% store the header
headers{count} = stringParts{1};

% convert the data into numbers
for j=2:length(stringParts)
    data{count}(j-1) = str2double(stringParts{j});
end
count=count+1;

I've had the same problem with reading csv data in Matlab, and I was surprised by how little support there is for this, but then I just found the import data tool. I'm in r2015b.

On the top bar in the "Home" tab, click on "Import Data" and choose the file you'd like to read. An app window will come up like this:

Import Data tool screenshot

Under "Import Selection" you have the option to "generate function", which gives you quite a bit of customization options, including how to fill empty cells, and what you'd like the output data structure to be. Plus it's written by MathWorks, so it's probably utilizing the fastest available method to read csv files. It was almost instantaneous on my file.

Q1) If you know the max number of columns you can fill empty entries with NaN Also, if all values are numerical, do you really need "Item#" column? If yes, you can use only "#", so all data is numerical.

Q2) The fastest way to read num. data from a file without mex-files is csvread. I try to avoid using strings in csv files, but if I have to, I use my csv2cell function:

http://www.mathworks.com/matlabcentral/fileexchange/20135-csv2cell

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top