Pergunta

Hi I keep getting an error with this:

%% generate sample data
K = 3;
numObservarations = 12000;
dimensions = 20;
data = fopen('M.dat','rt');
C = textscan(data,[numObservarations dimensions]);

??? Error using ==> textscan Second input must be empty or a format string.

I tryed this method:

%% format data
%# read the list of features
fid = fopen('kddcup.names','rt');
C = textscan(fid, '%s %s', 'Delimiter',':', 'HeaderLines',1);
fclose(fid);

%# determine type of features
C{2} = regexprep(C{2}, '.$','');              %# remove "." at the end
attribNom = [ismember(C{2},'symbolic');true]; %# nominal features

%# build format string used to read/parse the actual data
frmt = cell(1,numel(C{1}));
frmt( ismember(C{2},'continuous') ) = {'%f'}; %# numeric features: read as number
frmt( ismember(C{2},'symbolic') ) = {'%s'};   %# nominal features: read as string
frmt = [frmt{:}];
frmt = [frmt '%s'];                           %# add the class attribute

%# read dataset
fid = fopen('kddcup.data_10_percent_corrected','rt');
C = textscan(fid, frmt, 'Delimiter',',');
fclose(fid);

%# convert nominal attributes to numeric
ind = find(attribNom);
G = cell(numel(ind),1);
for i=1:numel(ind)
    [C{ind(i)},G{i}] = grp2idx( C{ind(i)} );
end

%# all numeric dataset
M = cell2mat(C);
data = M;

%% generate sample data
K = 3;
numObservarations = 12000;
dimensions = 20;
data = textscan([numObservarations dimensions]);

%% cluster
opts = statset('MaxIter', 500, 'Display', 'iter');
[clustIDX, clusters, interClustSum, Dist] = kmeans(data, K, 'options',opts, ...
'distance','sqEuclidean', 'EmptyAction','singleton', 'replicates',3);

%% plot data+clusters
figure, hold on
scatter3(data(:,1),data(:,2),data(:,3), 50, clustIDX, 'filled')
scatter3(clusters(:,1),clusters(:,2),clusters(:,3), 200, (1:K)', 'filled')
hold off, xlabel('x'), ylabel('y'), zlabel('z')

%% plot clusters quality
figure
[silh,h] = silhouette(data, clustIDX);
avrgScore = mean(silh);
%% Assign data to clusters
% calculate distance (squared) of all instances to each cluster centroid
D = zeros(numObservarations, K);     % init distances
for k=1:K
%d = sum((x-y).^2).^0.5
D(:,k) = sum( ((data - repmat(clusters(k,:),numObservarations,1)).^2), 2);
end

% find  for all instances the cluster closet to it
[minDists, clusterIndices] = min(D, [], 2);
% compare it with what you expect it to be
sum(clusterIndices == clustIDX)

but get the error:

??? Error using ==> textscan
Invalid file identifier.  Use fopen to
generate a valid file identifier.

Error in ==> kmeans at 37
data = textscan([numObservarations
dimensions]);
Foi útil?

Solução

Your call of textscan is not compliant with the syntax required. The following signatures are valid:

C = textscan(fid, 'format')
C = textscan(fid, 'format', N)
C = textscan(fid, 'format', 'param', value)
C = textscan(fid, 'format', N, 'param', value)
C = textscan(str, ...)
[C, position] = textscan(...)
Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top