I this link they show how to use lm() with a data frame

Right way to use lm in R

However (being completely new to R) I'm still a little unclear on the systax?

Is there more that this addition of the . to y ~, or does it simply denote that you have moved from a vector input to a data frame input?

有帮助吗?

解决方案

The . notation in a formula is commonly taken to mean "all other variables in data that do not already appear in the formula". Consider the following:

df <- data.frame(y = rnorm(10), A = runif(10), B = rnorm(10))
mod <- lm(y ~ ., data = df)
coef(mod)

R> coef(mod)
(Intercept)           A           B 
    -0.8389      0.5635     -0.2160

Ignore the values above; what is important is that there are two terms in the model (plus the intercept), taken from the set of names(df) that do not include y. This is exactly the same as writing out the full formula

mod <- lm(y ~ A + B, data = df)

but involves less typing. It is a convenient shortcut when the model formula might include many variables.

The other place this crops up is in update(), where the second argument is a formula and one uses . to indicate "what was already there". For example:

coef(update(mod, . ~ . - B))

R> coef(update(mod, . ~ . - B))
(Intercept)           A 
    -0.8156      0.5919

Hence the first ., to the left of ~ expands to "keep the existing response variable y", whilst the second ., to the right of ~ expands to A + B and hence we have A + B - B which cancels to A.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top