Accuracy value constant even after different runs

https://datascience.stackexchange.com/questions/13286

16-10-2019
|

문제

I am using the neural network toolbox of Matlab to train a network. Now my code is as follows:

x = xdata.';
t = target1';
% Create a Pattern Recognition Network
hiddenLayerSize = 10;
net = patternnet(hiddenLayerSize);
net.input.processFcns = {'removeconstantrows','mapminmax'};
net.output.processFcns = {'removeconstantrows','mapminmax'};
net.layers{2}.transferFcn = 'softmax';
net.divideFcn = 'dividerand';  % Divide data randomly
net.divideMode = 'sample';  % Divide up every sample
net.divideParam.trainRatio = 60/100;
net.divideParam.valRatio = 20/100;
net.divideParam.testRatio = 20/100;
net.trainFcn = 'trainscg';  % Scaled conjugate gradient
net.performFcn = 'mse';
net.performParam.regularization = 0.5;
%net.performParam.normalization = 0.01;
net.plotFcns = {'plotperform','plottrainstate','ploterrhist', ...
  'plotregression', 'plotfit', 'plotconfusion'};

% Train the Network
[net,tr] = train(net,x,t);
% Test the Network
y = net(x);
e = gsubtract(t,y);
tind = vec2ind(t);
yind = vec2ind(y);
percentErrors = sum(tind ~= yind)/numel(tind);
performance = perform(net,t,y)
% Recalculate Training, Validation and Test Performance
trainTargets = t .* tr.trainMask{1};
valTargets = t  .* tr.valMask{1};

Now I am supposed to get different accuracies with different run of data since the sampling (division of dataset into train test and validation set) is random. But I am getting a constant accuracy (89.7%). The variable 'xdata' contains only those features selected by a feature selection algorithms. Is there any reason why my accuracy value is constant?

I have trained an SVM too with the same dataset. There too I am getting the a stale accuracy even with multiple run (94%)

The output y contains 2 values. What do those values signify?

해결책

I see that you have set a Random Number Generator(rng) seed in the following line in your code: rng(1).

So, this splits the data in the same way no matter whichever run. So, that is the reason why you are getting the same error values.

Try removing the line. Then, the data shall be slit randomly (as there is no seed now). You shall get different error then, depending on how the split is done (which is, randomly).

A model which generalizes well should be robust to the choice of seed

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 datascience.stackexchange