Question

I want to write a function that can take a variable number of inputs and regress the first input on the rest of the inputs. More specifically,

Hypothetically, suppose the function had been supplied with 2 or 3 or 4 variables, I would defined it as:

egen_neut<-function(x,y) residuals(lm(x~y,na.action=na.exclude)
egen_neut<-function(x,y,z) residuals(lm(x~y+z,na.action=na.exclude)
egen_neut<-function(x,y,z,w) residuals(lm(x~y+z+w,na.action=na.exclude)

how can I convert the dot-dot-dot, i.e. "...", such that it can be interpreted as a formula with a "+" between the variables, i.e. what will go in place of the ????? below

egen_neut<-function(x,...) {
  residuals(lm(x ~ ?????,na.action=na.exclude)
}
Was it helpful?

Solution

Here's one way to do it:

ff <- function(x, ...) {
    mc <- as.list(match.call())[-1]
    ll <- as.character(mc[[1]])
    rr <- paste(sapply(mc[-(1)], as.character), collapse="+")
    fm <- as.formula(paste(ll, "~", rr))

    # now you can execute `lm` as:
    lm(fm, na.action = na.exclude)
}

# now you can do
ff(x, y, z, w)

where all these input variables reside in the global environment. Hope this helps.


Since you're dealing with data.frame, this is how I'd do it:

ff <- function(df, ...) {
    mc <- as.list(match.call())[-(1:2)]
    ll <- as.character(mc[[1]])
    rr <- paste(sapply(mc[-(1)], as.character), collapse="+")
    fm <- as.formula(paste(ll, "~", rr))

    # now you can execute `lm` as:
    lm(fm, data = df, na.action = na.exclude)
}

Suppose your data.frame is DF with columns x, y, z and you want to do x ~ y then:

ff(DF, x, y)

Or x ~ y + z, then:

ff(DF, x, y, z)

You get the idea.

OTHER TIPS

You don't have to do that. Just write your function to take a formula argument:

egen_neut <- function(fm)
resid(lm(fm, na.action=na.exclude))

egen_neut(x ~ y)
egen_neut(x ~ y + z)
egen_neut(x ~ y + z + w)

If you put all the variables into a data frame, then you can use a loop to extract a subset of the columns and a . in the formula object to fit an additive model to all of the variables in the data frame not already in the formula. Suppose your data frame, d, with columns x, y, z, etc. Then something like

sapply(seq(2, length(d)), function(ix, d) residuals(lm(x ~ ., d[, seq(ix)])), d = d)

should do the trick, but this is untried so it might need some tweaking

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top