I haven't get a chance to take a look at adapt
function yet, but I suspect it is updating instead of overwriting. To verify this statement, you may need to select a subset of your first data chunk as the second chunk in training. If it is overwriting, when you use the trained net with the subset to test your first data chunk, it is supposed to poorly predict those data that do not belong to the subset.
I tested it with a very simple program: train the curve y=x^2
. During first training process, I learned the data set [1,3,5,7,9]
:
m=6;
P=[1 3 5 7 9];
T=P.^2;
[Pn,minP,maxP,Tn,minT,maxT] = premnmx(P,T);
clear net
net.IW{1,1}=zeros(m,1);
net.LW{2,1}=zeros(1,m);
net.b{1,1}=zeros(m,1);
net.b{2,1}=zeros(1,1);
net=newff(minmax(Pn),[m,1],{'logsig','purelin'},'trainlm');
net.trainParam.show =100;
net.trainParam.lr = 0.09;
net.trainParam.epochs =1000;
net.trainParam.goal = 1e-3;
[net,tr]=train(net,Pn,Tn);
Tn_predicted= sim(net,Pn)
Tn
The result (note that the output are scaled with the same reference. If you are doing the standard normalization, make sure you always apply the mean and std value from the 1st training set to all the rest):
Tn_predicted =
-1.0000 -0.8000 -0.4000 0.1995 1.0000
Tn =
-1.0000 -0.8000 -0.4000 0.2000 1.0000
Now we are implementing the second training process, with the training data [1,9]
:
Pt=[1 9];
Tt=Pt.^2;
n=length(Pt);
Ptn = tramnmx(Pt,minP,maxP);
Ttn = tramnmx(Tt,minT,maxT);
[net,tr]=train(net,Ptn,Ttn);
Tn_predicted= sim(net,Pn)
Tn
The result:
Tn_predicted =
-1.0000 -0.8000 -0.4000 0.1995 1.0000
Tn =
-1.0000 -0.8000 -0.4000 0.2000 1.0000
Note that the data with x=[3,5,7];
are still precisely predicted.
However, if we train only x=[1,9];
from the very beginning:
clear net
net.IW{1,1}=zeros(m,1);
net.LW{2,1}=zeros(1,m);
net.b{1,1}=zeros(m,1);
net.b{2,1}=zeros(1,1);
net=newff(minmax(Ptn),[m,1],{'logsig','purelin'},'trainlm');
net.trainParam.show =100;
net.trainParam.lr = 0.09;
net.trainParam.epochs =1000;
net.trainParam.goal = 1e-3;
[net,tr]=train(net,Ptn,Ttn);
Tn_predicted= sim(net,Pn)
Tn
Watch the result:
Tn_predicted =
-1.0071 -0.6413 0.5281 0.6467 0.9922
Tn =
-1.0000 -0.8000 -0.4000 0.2000 1.0000
Note the trained net did not perform well on x=[3,5,7];
The test above indicates that the training is based on previous net instead of restarting. The reason why you get worse performance is you only implement once for each data chunk (stochastic gradient descent rather than batch gradient descent), so the total error curve may not converge yet. Suppose you only have two data chunk, you may need to re-training the chunk 1 after done training chunk 2, then re-training chunk 2, then chunk 1, so on and so forth until some conditions are met. If you have much much more chunks, you may not need to worry about the 2nd compared with 1st training effect. Online learning just drops out the previous data set no matter whether the updated weights compromise the performance on them.