Question

I was trying to automate a piece of my code so that programming become less tedious.

Basically I was trying to do a stepwise selection of variables using fastbw() in the rms package. I would like to pass the list of variables selected by fastbw() into a formula as y ~ x1+x2+x3, "x1" "x2" "x3" being the list of variables selected by fastbw()

Here is the code I tried and did not work

olsOAW0.r060 <- ols(roll_pct~byoy+trans_YoY+change18m, 
                    subset= helper=="POPNOAW0_r060", 
                    na.action = na.exclude, 
                    data = modelready)

OAW0 <- fastbw(olsOAW0.r060, rule="p", type="residual", sls= 0.05)

vec <- as.vector(OAW0$names.kept, mode="any")

b <- paste(vec, sep ="+") ##I even tried b <- paste(OAW0$names.kept, sep="+")

bestp.OAW0.r060 <- lm(roll_pct ~ b , 
                      data = modelready, 
                      subset = helper =="POPNOAW0_r060",    
                      na.action = na.exclude)

I am new to R and still haven't trailed the steep learning curve, so apologize for obvious programming blunders.

Was it helpful?

Solution

You're almost there. You just have to paste the entire formula together, something like this:

paste("roll_pct ~ ",b,sep = "")

coerce it to an actual formula using as.formula and then pass that to lm. Technically, I think lm may coerce a character string itself, but coercing it yourself is generally safer. (Some functions that expect formulas won't do the coercion for you, others will.)

OTHER TIPS

You would actually need to use collapse instead of seb when defining b.

b <- paste(OAW0$names.kept, collapse="+")

Then you can put it in joran answer

paste("roll_pct ~ ",b,sep = "")

or just use:

paste("roll_pct ~ ",paste(OAW0$names.kept, collapse="+"),sep = "")

I ran into similar issue today, if you want to make it even more generic where you don't even have to have fixed class name, you can use

frmla <- as.formula(paste(colnames(modelready)[1], paste(colnames(modelready)[2:ncol(modelready)], sep = "", 
                              collapse = " + "), sep = " ~ "))

This assumes that you have class variable or the dependent variable in the first column but indexing can be easily switched to last column as:

frmla <- as.formula(paste(colnames(modelready)[ncol(modelready)], paste(colnames(modelready)[1:(ncol(modelready)-1)], sep = "", 
                              collapse = " + "), sep = " ~ "))

Then continue with lm using:

bestp.OAW0.r060 <- lm(frmla , data = modelready, ... )

If you're looking for something less verbose:

fm <- as.formula( paste( colnames(df)[i], ".", sep=" ~ ")) 
                                      # i is the index of the outcome column

Here it is in a function:

getFormula<-function(target, df) {

  i <- grep(target,colnames(df))
  as.formula(paste(colnames(df)[i], 
                   ".", 
                   sep = " ~ "))
}
fm <- getFormula("myOutcomeColumnName", myDataFrame)
rp <- rpart(fm, data = myDataFrame) # Use the formula to build a model

just to simplify and collect above answers, based on a function

my_formula<- function(colPosition, trainSet){
    dep_part<- paste(colnames(trainSet)[colPosition],"~",sep=" ")
    ind_part<- paste(colnames(trainSet)[-colPosition],collapse=" + ")
    dt_formula<- as.formula(paste(dep_part,ind_part,sep=" "))
    return(dt_formula)
}

To use it:

my_formula( dependent_var_position, myTrainSet)

One trick that I use in similar situations is to subset your data and simply use e.g. lm(dep_var ~ ., data = your_data).

Example

data(mtcars)
ind_vars <- c("mpg", "cyl")
dep_var <- "hp"

temp_subset <- dplyr::select(mtcars, dep_var, ind_vars)

lm(hp ~., data = temp_subset)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top