If your goal is model selection there are several tools available in R which attempt to automate this process. Read the documentation on dredge(...)
in the MuMIn
package.
# dredge: example of use
library(MuMIn)
df <- mtcars[,c("mpg","cyl","disp","hp","wt")] # subset of mtcars
full.model <- lm(mpg ~ cyl+disp+hp+wt,df) # model for predicting mpg
dredge(full.model)
# Global model call: lm(formula = mpg ~ cyl + disp + hp + wt, data = df)
# ---
# Model selection table
# (Intrc) cyl disp hp wt df logLik AICc delta weight
# 10 39.69 -1.5080 -3.191 4 -74.005 157.5 0.00 0.291
# 14 38.75 -0.9416 -0.01804 -3.167 5 -72.738 157.8 0.29 0.251
# 13 37.23 -0.03177 -3.878 4 -74.326 158.1 0.64 0.211
# 16 40.83 -1.2930 0.011600 -0.02054 -3.854 6 -72.169 159.7 2.21 0.096
# 12 41.11 -1.7850 0.007473 -3.636 5 -73.779 159.9 2.37 0.089
# 15 37.11 -0.000937 -0.03116 -3.801 5 -74.321 161.0 3.46 0.052
# 11 34.96 -0.017720 -3.351 4 -78.084 165.6 8.16 0.005
# 9 37.29 -5.344 3 -80.015 166.9 9.40 0.003
# 4 34.66 -1.5870 -0.020580 4 -79.573 168.6 11.14 0.001
# 7 30.74 -0.030350 -0.02484 4 -80.309 170.1 12.61 0.001
# 2 37.88 -2.8760 3 -81.653 170.2 12.67 0.001
# 8 34.18 -1.2270 -0.018840 -0.01468 5 -79.009 170.3 12.83 0.000
# 6 36.91 -2.2650 -0.01912 4 -80.781 171.0 13.55 0.000
# 3 29.60 -0.041220 3 -82.105 171.1 13.57 0.000
# 5 30.10 -0.06823 3 -87.619 182.1 24.60 0.000
# 1 20.09 2 -102.378 209.2 51.68 0.000
You should consider these tools to help you make intelligent decisions. Do not let the tool make the decision for you!!!
For example, in this case dredge(...)
suggests that the "best" model for predicting mpg
, based on the AICc criterion, includes cyl
and wt
. But note that AICc for this model is 157.7 whereas the second best model has an AICc of 157.8, so these are basically the same. In fact, the first 5 models in this list are not significantly different in their ability to predict mpg
. It does, however, narrow things down a bit. Among these 5, I would want to look at distribution of residuals (should be normal), trends in residuals (there should be none), and leverage (do some points have undue influence), before picking a "best" model.