문제

How can Levene's test be performed using the squared residuals rather than the absolute?

I have tried levene.test in the lawstat package, and leveneTest in the car package, which both use the absolute residuals.

The purpose is to reproduce SAS output, which uses squared residuals by default.

도움이 되었습니까?

해결책

iris.lm <- lm(Petal.Width ~ Species, data = iris)
anova(lm(residuals(iris.lm)^2 ~ iris$Species))
## Analysis of Variance Table
## 
## Response: residuals(iris.lm)^2
##               Df Sum Sq Mean Sq F value  Pr(>F)    
## iris$Species   2  0.100  0.0500    14.8 1.4e-06 ***
## Residuals    147  0.497  0.0034                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Perhaps it helps to understand how this works.

As is pointed out out here, Levene's test is just an ANOVA of the distance between each observation and the centre of its group. Different implementations of Levene's test differ by their definitions of “distance” and “centre”.

“Distance” can mean the absolute differences, or the squared differences.

“Centre” can mean the mean or the median.

SAS uses squared differences by default, and the mean. leveneTest in the car package of R uses the absolute differences only, and the median by default. So does levene.test in the lawstat package.

All four possible combinations can be done by hand as follows.

require(plyr)
x <- ddply(iris, .(Species), summarize
            , abs.mean = abs(Petal.Width - mean(Petal.Width))
            , abs.median = abs(Petal.Width - median(Petal.Width))
            , squared.mean = (Petal.Width - mean(Petal.Width))^2
            , squared.median = (Petal.Width - median(Petal.Width))^2)

anova(lm(abs.mean ~ Species, data = x)) # Levene's test
## Analysis of Variance Table
## 
## Response: abs.mean
##            Df Sum Sq Mean Sq F value  Pr(>F)    
## Species     2   0.53  0.2648    19.6 2.7e-08 ***
## Residuals 147   1.98  0.0135                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

anova(lm(abs.median ~ Species, data = x)) # Brown-Forsythe test
## Analysis of Variance Table
## 
## Response: abs.median
##            Df Sum Sq Mean Sq F value  Pr(>F)    
## Species     2  0.642   0.321    19.9 2.3e-08 ***
## Residuals 147  2.373   0.016                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

anova(lm(squared.mean ~ Species, data = x)) # default SAS Levene's Test
## Analysis of Variance Table
## 
## Response: squared.mean
##            Df Sum Sq Mean Sq F value  Pr(>F)    
## Species     2  0.100  0.0500    14.8 1.4e-06 ***
## Residuals 147  0.497  0.0034                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

anova(lm(squared.median ~ Species, data = x)) # Who-Knows-Whose Test
## Analysis of Variance Table
## 
## Response: squared.median
##            Df Sum Sq Mean Sq F value  Pr(>F)    
## Species     2  0.096  0.0478    13.6 3.7e-06 ***
## Residuals 147  0.515  0.0035                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

To show that the first two above reproduce leveneTest:

require(car)
leveneTest(Petal.Width ~ Species, data = iris, center = mean)
## Levene's Test for Homogeneity of Variance (center = mean)
##        Df F value  Pr(>F)    
## group   2    19.6 2.7e-08 ***
##       147                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

leveneTest(Petal.Width ~ Species, data = iris, center = median)
## Levene's Test for Homogeneity of Variance (center = median)
##        Df F value  Pr(>F)    
## group   2    19.9 2.3e-08 ***
##       147                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Since one often has a linear model to hand with the residuals (mean) ready to go, it's often more convenient to do

iris.lm <- lm(Petal.Width ~ Species, data = iris)
anova(lm(residuals(iris.lm)^2 ~ iris$Species))
## Analysis of Variance Table
## 
## Response: residuals(iris.lm)^2
##               Df Sum Sq Mean Sq F value  Pr(>F)    
## iris$Species   2  0.100  0.0500    14.8 1.4e-06 ***
## Residuals    147  0.497  0.0034                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Hence the answer at the top.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top