You are fitting the model using method = "REML"
which is the restricted likelihood. It does not always follow that the maximized likelihood under REML will be close to the likelihood under unrestricted ML. Set method = "ML"
and see if that fixes the "problem" with AIC/BIC.
The best model, according to both AIC and BIC, contains only a non significant term
-
21-06-2023 - |
Question
Id like to know how is it possible that the best model (m6), based on AIC and BIC values, can have a non significant term whereas the second best model(m5) has a signficant term.
I have the following competing models list:
m1=gls(Area_Km2~size+cent_Latitude+PCptail+pwingbilltar,corMartins(1,phy=ctree),data = c)
m2=gls(Area_Km2~size+cent_Latitude+PCptail,corMartins(1,phy=ctree),data = c)
m3=gls(Area_Km2~size+cent_Latitude,corMartins(1,phy=ctree),data = c)
m4=gls(Area_Km2~size,corMartins(1,phy=ctree),data = c)
m5=gls(Area_Km2~PCptail,corMartins(1,phy=ctree),data = c)
m6=gls(Area_Km2~cent_Latitude,corMartins(1,phy=ctree),data = c)
m7=gls(Area_Km2~pwingbilltar,corMartins(1,phy=ctree),data = c)
Here is model comparison
Model df AIC BIC logLik Test L.Ratio p-value
m1 1 7 147.2775 157.9620 -66.63873
m2 2 6 139.4866 148.8187 -63.74331 1 vs 2 5.790838 0.0161
m3 3 5 133.3334 141.2510 -61.66672 2 vs 3 4.153191 0.0416
m4 4 4 130.7749 137.2186 -61.38746 3 vs 4 0.558517 0.4549
m5 5 4 127.0635 133.5072 -59.53175
m6 6 4 125.1006 131.5443 -58.55029
m7 7 4 132.4542 138.8979 -62.22711
here goes m6
Generalized least squares fit by REML
Model: Area_Km2 ~ cent_Latitude
Data: c
AIC BIC logLik
125.1006 131.5442 -58.55029
Correlation Structure: corMartins
Formula: ~1
Parameter estimate(s):
alpha
1
Coefficients:
Value Std.Error t-value p-value
(Intercept) 0.4940905 0.1730082 2.8558795 0.0070
cent_Latitude -0.1592109 0.1726268 -0.9222837 0.3624
Correlation:
(Intr)
cent_Latitude -0.158
Standardized residuals:
Min Q1 Med Q3 Max
-1.3270048 -0.7088524 -0.2828898 0.4672255 2.2203523
Residual standard error: 1.066911
Degrees of freedom: 39 total; 37 residual
here goes m5
Generalized least squares fit by REML
Model: Area_Km2 ~ PCptail
Data: c
AIC BIC logLik
127.0635 133.5072 -59.53175
Correlation Structure: corMartins
Formula: ~1
Parameter estimate(s):
alpha
1
Coefficients:
Value Std.Error t-value p-value
(Intercept) 0.19752329 0.20158500 0.9798512 0.3335
PCptail 0.01925621 0.00851536 2.2613499 0.0297
Correlation:
(Intr)
PCptail -0.595
Standardized residuals:
Min Q1 Med Q3 Max
-1.3416127 -0.6677304 -0.2467510 0.3198370 2.3339127
Residual standard error: 1.01147
Degrees of freedom: 39 total; 37 residual
Solution 2
OTHER TIPS
There are at least two things going on here. First, it is not meaningful to assert that the model with the lowest AIC is the "best" model. For a set of models with different AIC, the relative probability that the ith model is better than the model with the lowest AIC is given by: (see here, and references cited therein):
L = exp[ ( AICmin - AICi ) / 2 ]
So, comparing m5
and m6
:
L = exp[ (125.1006 - 127.0635) / 2 ] = 0.374
Or, there is a 37% chance that m5
is in fact the better model. The point is that there is not a significant difference between an AIC of 125.2 and an AIC of 127, so you cannot say the m6
is "best". Both models do about equally well.
So why is cent_Latitude
insignificant in m6
? A p-value > 0.05 means that the effect of cent_Latitude
on the response is small compared to the error in the response. This could happen because the true effect size is 0, or it could happen because the effect size, combined with the range in cent_latitude
results in an effect on the response which is small compared to the error in the response. You can see this in the following, which uses made-up data and creates the same effect yor are seeing with your real data.
Suppose the the response variable actually depends on both cent_Latitude
and PCptail
. In m6
, variability in response due to the effect of PCptail
is counted towards the "error" in the model, reducing the calculated significance of cent_Latitude
. On the other hand, in m5
the variability in response due to the effect of cent_Latitude
is counted toward the error, reducing the significance of PCptail
. If the effect sizes, compared to the true error, are just right you can get this effect, as shown below. Note that this is one of the reasons it is not recommended to compare non-nested models using a single statistic (like AIC, or RSQ, or even F).
library(nlme)
set.seed(1)
# for simplicity, use un-correlated predictors
c <- data.frame(PCptail=sample(seq(0,10,.1),length(seq)),
cent_Latitude=sample(seq(0,1,.01),length(seq)))
# response depends on both predictors
c$Area <- 1 + .01*c$PCptail +.1*c$cent_Latitude + rnorm(nrow(c),0,1)
m6 <- gls(Area~cent_Latitude,c)
m5 <- gls(Area~PCptail,c)
summary(m6)
# Generalized least squares fit by REML
# Model: Area ~ cent_Latitude
# Data: c
# AIC BIC logLik
# 288.5311 296.3165 -141.2656
#
# Coefficients:
# Value Std.Error t-value p-value
# (Intercept) 1.1835936 0.1924341 6.150645 0.0000
# cent_Latitude -0.1882202 0.3324754 -0.566118 0.5726
# ...
summary(m5)
# Generalized least squares fit by REML
# Model: Area ~ PCptail
# Data: c
# AIC BIC logLik
# 289.2713 297.0566 -141.6356
#
# Coefficients:
# Value Std.Error t-value p-value
# (Intercept) 0.7524261 0.18871413 3.987121 0.0001
# PCptail 0.0674115 0.03260484 2.067530 0.0413
# ...
So how to deal with this? Well, have you looked at the residuals plots for all these models? Have you looked at the Q-Q plots? Leverage plots? Generally, all other thins being equal, I would pick the model with the more significant parameters, assuming the residuals were random and normally distributed, and the none of the data points had unusually high leverage.