lm options, do regression of each category [duplicate]

https://stackoverflow.com//questions/12667782

11-12-2019
|

Question

Data:

Y   X   levels
y1  x1  2

...

lm(Y~X,I(levels==1))

Does the I(levels==1) mean under levels==1? If not, how can I do regression of Y vs X only when levels equals 1?

Solution

You have the parameter subset of lm, here is an example.

x <- rnorm(100)
y <- rnorm(100, sd=0.1) 
y[1:50] <- y[1:50] + 3*x[1:50] + 10  # line y = 3x+10
y[51:100] <- y[51:100] + 8*x[51:100] - 5 # line y = 8x-5
levels <- rep(1:2, each=50, len=100)

data = data.frame(x=x, y=y, levels=levels)

lm(y ~ x, data=data, subset=levels==1) # regression for the first part

Coefficients:   (Intercept)            x  
                    10.015          2.996

lm(y ~ x, data=data, subset=levels==2) # second part

Coefficients:   (Intercept)            x  
                    -4.986          8.000

You are passing I(levels==1) implicitly to subset inside lm.

OTHER TIPS

Have a look at lmList from the nlme package

set.seed(12345)
dataset <- data.frame(x = rnorm(100), y = rnorm(100), levels = gl(2, 50))
dataset$y <- with(dataset, 
    y + (0.1 + as.numeric(levels)) * x + 5 * as.numeric(levels)
)
library(nlme)
models <- lmList(y ~ x|levels, data = dataset)

the output is a list of lm models, one per level

models

Call:
  Model: y ~ x | levels 
   Data: dataset 

Coefficients:
  (Intercept)        x
1    4.964104 1.227478
2   10.085231 2.158683

Degrees of freedom: 100 total; 96 residual
Residual standard error: 1.019202

here is the summary of the first model

summary(models[[1]])

Call:
lm(formula = form, data = dat, na.action = na.action)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.16569 -1.04457 -0.00318  0.78667  2.65927 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   4.9641     0.1617  30.703  < 2e-16 ***
x             1.2275     0.1469   8.354 6.47e-11 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 1.128 on 48 degrees of freedom
Multiple R-squared: 0.5925,     Adjusted R-squared: 0.584 
F-statistic: 69.78 on 1 and 48 DF,  p-value: 6.469e-11

I was not sure. But this code seems to suggest that you are correct.

my.data <- "x y level 
            1  2 1
            2  4 2
            3  4 1
            4  3 2
            5  5 1
            6  5 2
            7  7 1
            8  6 2
            9 10 1
            10 5 2"

my.data2 <- read.table(textConnection(my.data), header = T)
my.data2

lm(x ~ y,I(level==1), data=my.data2)



my.data <- "x y level 
            1  2 1
            3  4 1
            5  5 1
            7  7 1
            9 10 1"

my.data2 <- read.table(textConnection(my.data), header = T)
my.data2

lm(x ~ y, data=my.data2)

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow