R caret package rfe never finishes error task 1 failed - "replacement has length zero"

Question

So looking at the data, there are three reasons for the failure. First,

> str(x)
'data.frame':   100 obs. of  34 variables:
 $ f2  : Factor w/ 10 levels "1","2","3","4",..: 8 8 8 8 9 8 9 9 7 8 ...
<snip>

rfe fits an lm model to these data and generates 39 coefficients even though the data frame x has 34 columns. As a result, rfe gets... confused. Try using model.matrix to convert the factor to dummy variables before running rfe:

x2 <- model.matrix(~., data = x)[,-1]  ## the -1 removes the intercept column

... but...

> table(x$f2)

 1  2  3  4  6  7  8  9 10 11 
 0  0  0  2  2  5 32 36 23  0

so model.matrix will generate some zero-variance predictors (which is an issue). You could make a new factor with new levels that excludes the empty levels but keep in mind that any resampling on these data will coerce some of the factor levels (e.g. "4", "6") into zero-variance predictors.

Secondly, there is perfect correlation between some predictors:

> cor(x$f597, x$f599)
     [,1]
[1,]    1

This will cause NA values for some of the model coefficients and lead to missing variable importances and will tank rfe.

Unless you are using trees or some other model that is tolerant to sparse and/or correlated predictors, a possible workflow prior to rfe could be:

> x2 <- model.matrix(~., data = x)[,-1]
> 
> nzv <- nearZeroVar(x2)
> x3 <- x2[, -nzv]
> 
> corr_mat <- cor(x3)
> too_high <- findCorrelation(corr_mat, cutoff = .9)
> x4 <- x3[, -too_high]
> 
> c(ncol(x2), ncol(x3), ncol(x4))
[1] 42 37 27

Lastly, by the looks of y you want to predict a number but lrFuncs is for logistic regression so I assume it was a typo for lmFuncs. If that is the case, rfe works fine:

> subsets <- c(1:5, 10, 15, 20, 25)
> ctrl <- rfeControl(functions = lmFuncs,
+                    method = "repeatedcv",
+                    repeats = 1,
+                    number=5)
> set.seed(1)
> lrProfile <- rfe(as.data.frame(x4), y,
+                  sizes = subsets,
+                  rfeControl = ctrl)

Max