Question

collection <- data.frame(col1=X1,col2=X2,col3=X3,col4=X4)
k <- 5
ind <- sample(seq(1,k), length(X1), replace=TRUE)

test_ind = which(ind==1)
train<-collection[-test_ind,]
fit<-lm(X1~poly(X2,2,raw=T)+X3+X4+X2:X3,data=train)
model1_resid<-predict(fit,collection[test_ind,2:4])

Warning message: 'newdata' had 105 rows but variables found have 444 rows

BTW: length(test_ind) is 105 and nrow(train)=444

I plan to run cross validation, but the above code generates the warning, I already followed other posts in this forum to do subsetting before I enter the lm function, why there is still warning? Anyone can point out the bug? Thanks

Was it helpful?

Solution

I think you need to use the same variable names, so if you want to use columns 2,3,4 for your prediction, the names shoult be X1, X2, X3 as they are used for the model (not col2, col3 and col4 as you have).

Try for example colnames(collection) = c("X0", "X1", "X2", "X3") before the predict call and it should work (although I don't understand if you really wanted to use col2, col3 and col4 for predicting).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top