Determining the goodness of an R fit using lm()

https://stackoverflow.com/questions/7114187

30-12-2020
|

Pregunta

I learned to get a linear fit with some points using lm in my R script. So, I did that (which worked nice), and printed out the fit:

lm(formula = y2 ~ x2)

Residuals:
         1          2          3          4 
 5.000e+00 -1.000e+01  5.000e+00  7.327e-15 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept)   70.000     17.958   3.898  0.05996 . 
x2            85.000      3.873  21.947  0.00207 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 8.66 on 2 degrees of freedom
Multiple R-squared: 0.9959, Adjusted R-squared: 0.9938 
F-statistic: 481.7 on 1 and 2 DF,  p-value: 0.00207

I'm trying to determine the best way to judge how great this fit is. I need to compare this fit with a few others (which are also linear using lm() function). What value from this summary would be the best way to judge how good this fit is? I was thinking to use the residual standard error. Any suggestions. Also, how do I extract that value from the fit variable?

Solución

If you want to access the pieces produced by summary directly, you can just call summary and store the result in a variable and then inspect the resulting object:

rs <- summary(lm1)
names(rs)

Perhaps rs$sigma is what you're looking for?

EDIT

Before someone chides me, I should point out that for some of this information, this is not the recommended way to access it. Rather you should use the designated extractors like residuals() or coef.

Otros consejos

This code would do something similar:

 y2 <- seq(1, 11, by=2)+rnorm(6)  # six data points to your four points
 x2=1:6
 lm(y2 ~ x2)
 summary(lm(y2 ~ x2))

The adjusted R^2 is the "goodness of fit" measure. It is saying that 99% of the variance in y2 can be "explained" by a straight line fit of y2 to x2. Whether you want to interpret your model with only 4 data points on the basis of that result is a matter of judgment. It would seem to somewhat dangerous to me.

To extract the residual sum of squares you use:

summary(lm(y2~x2))$sigma

See this for further details:

?summary.lm

There are some nice regression diagnostic plots you can look at with

plot(YourRegression, which=1:6)

where which=1:6 give you all six plots. The RESET test and bptest will test for misspecification and heteroskedasticity:

resettest(...)
bptest(...)

There are a lot of resources out there to think about this sort of thing. Fitting Distributions in R is one of them, and Faraway's "Practical Regression and Anova" is an R classic. I basically learned econometrics in R from Farnsworth's paper/book, although I don't recall if he has anything about goodness of fit.

If you are going to do a lot of econometrics in R, Applied Econometrics in R is a great pay-for book. And I've used the R for Economists webpage a lot.

Those are the first ones that pop to mind. I will mull a little more.

Licenciado bajo: CC-BY-SA con atribución

No afiliado a StackOverflow