Leave one out cross validation with lm function in R

https://stackoverflow.com/questions/21371090

03-10-2022
|

Question

I have a dataset of 506 rows on which I am performing Leave-one-out Cross Validation, once I get the mean squared errors , I am computing the mean of the mean squared errors I found. This is changing everytime I run it. Is this expected ? If so, Can someone please explain why is it changing everytime I run it ?

To do leave one out CV, I shuffle the rows first , df is the data frame

df <-df[sample.int(nrow(df)),]

Then, I split the dataframe into 506 data frames and send it to lm() and get the MSE for each data frame (in this case, each row)

fit <- lm(train[,lastcolumn] ~.,data = train)
pred <- predict(fit,test)
pred <- mean((pred - test[,lastcolumn])^2)

And then I take the mean of all the MSEs I got.

Everytime I run all this , I get a different mean. Is this expected ?

Solution

Leave-one-out cross-validation is a validation paradigm. You have to state what algorithm you are using for your predictions and you have to look whether there is some random initialization of the parameters in the prediction algorithm. If that initialization changes randomly that could explain a different result everytime the underlying algorithm is run. You have to mention which estimator / prediction algorithm you are using. If you use a Gaussian Mixture Model e.g. for classification with different initialization for means and covariances that would be a possible algorithm where performance is not necessarily always the same in a LOOCV. Gaussian mixture models and K-means algorithms typically randomize the selection of data points to represent a mean. Also the number of Gaussians in the mixture could change with different initializations if an information theoretic criterion i used for estimating the number of Gaussians.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow