I would assume, that there is some error in the data, or format pattern does not match the data. Try to extract these lines:
file_id=fopen('CRSP.csv');
for idx=1:1424456
fgetl(file_id); %dump data
end
for idx=1:10
fprintf('%s\n',fgetl(file_id));
end
If there is an error, it should be at the 2rd or 3nd printed line. Anything special there? Maybe a COMNAM
with some special character?
To read the file, i would use the following code to read line by line:
file_id=fopen('CRSP.csv');
line=fgetl(file_id);
data={};
int ix=1;
while(ischar(line))
[parsed,sindex,eindex] = regexpi(line,'(\d\d/\d\d/\d\d\d\d)\s*, ([\w ]+), ([\w ]+), ([\d]+), ([\d]+), ([\d]+), ([\d]+), ([\d \.]+), ([\d \.]+)','tokens')
if ~isempty(sindex)&&numel(sindex)==1&&(sindex==1)&&(eindex==numel(x))
data{end+1}=parsed{1};
else
fprintf('Unable to parse line %d with content: %S',ix,line);
end
line=fgetl(file_id);
ix=ix+1;
end
Short summary of regular expressions:
'(...)' Everything between is a "token" which is returned
'([\d .]+)' Numbers, white space and "."
'([\d .]+)' Numbers and white space
'([\w ]+)' Word, including white space
'(\d\d/\d\d/\d\d\d\d)' date
This expression is a bit "lazy". It not only accepts "0.000" as a number but also "0.0 00." or some other combinations, but it should be enough to detect all errors. If not, the expression has to be improved.