Question

I have a question regarding p-values. I've been comparing different linear models to determine if one model is better than another with the following function in R.

 anova(model1,model2)

Unfortunately, occasionally it will not calculate an F or a p-value. Here is an example of an anova summary that did not give a p-value

 Analysis of Variance Table

 Model 1: Influence ~ SortedSums[, Combos2[1, A]] + SortedSums[, Combos2[2,A]]
 Model 2: Influence ~ SortedSums[, B]
    Res.Df   RSS Df Sum of Sq F Pr(>F)
 1    127 3090.9                      
 2    128 2655.2 -1    435.74 

For the sake of symmetry, here is also an anova summary that did yield a p-value.

 Analysis of Variance Table

 Model 1: Influence ~ SortedSums[, Combos2[1, A]] + SortedSums[, Combos2[2,A]]
 Model 2: Influence ~ SortedSums[, B]
    Res.Df    RSS Df Sum of Sq      F Pr(>F)
  1    127 3090.9                           
  2    128 3157.6 -1   -66.652 2.7386 0.1004

Do you know why this occurs?

Was it helpful?

Solution

Not all questions require code examples. You don't deserve to be snarked at for being new, and I'm sorry people did. Here is the answer:

The difference between the two models is not significant.

Here is what you can do about it:

  • Check to make sure that the terms of one model object are a superset of the terms of the other. Otherwise, the default anova test is invalid to begin with (you could instead compare such non-nested models using AIC, but that belongs in a separate question). I'm actually really curious to see a nested pair of models that manages to be that non-significant, but again, it's not necessary to answering this question.
  • If you checked, and the models are nested, and this is analysis you are doing manually, write p=1.0 in your report and call it a day.
  • If the models are nested, and the above feels like cheating, here's how to do it th hard way. What you are really asking anova is whether that one variable by which they differ makes a significant contribution to fit. Take the "larger" model and do summary(BAR). The p-value corresponding to the variable present in BAR but missing in FOO is your p-value! And it's probably equal to 1. And the square of the t-statistic is the F-value.
  • If the models are nested and this is analysis you are doing programmatically and the absence of a p-value breaks stuff elsewhere in your script, just do anova(FOO,BAR)[,5:6] to get NAs instead of blanks... but then again, if you were doing it programmatically you would have already tried that.

Good luck!

OTHER TIPS

Recently, I also came across this issue when comparing a segmented linear model (with package segmented) with one breakpoint to a linear model without breakpoints. The simple linear model is a nested model, because the part before the breakpoint could span the entire data set.

The segmented fit, however, (that I invoked with a lax convergence tolerance for performance reasons) reported a fit, where the residual sums of the more complex segmented model were slightly larger than with the simple linear model. Of course, the best fit of the more complex nested model should not have larger residual variance, and the anova function reported a p-value of NA.

In this case, clearly, the more complex model was not signifcantly better, p > alpha, e.g. p=1

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top