Question

To fit neural network to a dataset using R function nnet, I learned that when the cases are unevenly distributed across classes, I should weights each case properly (http://cowlet.org/2014/01/12/understanding-data-science-classification-with-neural-networks-in-r.html).

R function nnet has a "weights" input, and I would like to know how exactly this is doing. The help file only says "(case) weights for each example – if missing defaults to 1", which is not so clear to me. I originally thought that the weights are affecting the determination of threshold but not the back-propagation algorithm. However, my naive guess seems to be not correct. To see this, I generated very simple unevenly distributed two classes:

 library(nnet)

 p1 <- 0.05
 p2 <- 1 - p1
 Ntot <- 2000
 class <- sample(1:2,Ntot,prob=c(p1,p2),replace=TRUE)
 dat <- scale(cbind(f1=rnorm(Ntot,mean=class), f2=rnorm(Ntot,mean=class,sd=0.01)))

Then fitted the model with two nnet: one with case weights proportional to its class and another with all weights 1.

 myWeight <- rep(NA,length(class))
 myWeight[class==1] <- p1
 myWeight[class==2] <- p2
 set.seed(1)
 fitw <- nnet(class~.,data=dat,weights=myWeight,size=3,decay=0.1)
 set.seed(1)
 fit0 <- nnet(class~.,data=dat,size=3,decay=0.1)

Now I estimate the response values (ranging between 0 and 1).

 pred.raw.w <- predict(fitw,type="raw")
 pred.raw0 <- predict(fit0,type="raw")

 head(pred.raw.w)
 head(pred.raw0)

If my naive guess was true, I would have seen the same raw response estimates. I see that the two response values are different! This means that the weights must do something to the computation of back-propagation equation (and not just the threshold). Can anyone tell me what exactly weights is doing or direct me to reference?

Était-ce utile?

La solution

'case weights' refers to importance weighting of each observation. Weights can be used to tailor the ML algorithm to focus on certain aspects of the data.

Take, for example, a problem of forecasting sales for a store. It might be more important to project sales around weekends and holidays, as the majority of a store's volume is purchased during those times. You can then assign a column of weights that has weekdays as '1' and weekends/holidays as '2'.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top