Question

I'm getting different results from R and SAS when I try to calculate a weighted variance. Does anyone know what might be causing this difference?

I create vectors of weights and values and I then calculate the weighted variance using the Hmisc library wtd.var function:

library(Hmisc)
wt <- c(5,  5,  4,  1)
x <- c(3.7,3.3,3.5,2.8)
wtd.var(x,weights=wt)

I get an answer of:

[1] 0.0612381

But if I try to reproduce these results in SAS I get a quite different result:

data test;
  input wt x;
cards;
5 3.7
5 3.3
4 3.5
1 2.8
;
run;
proc means data=test var;
var x;
weight wt;
run;

Results in an answer of

0.2857778
Was it helpful?

Solution

You probably have a difference in how the variance is calculated. SAS gives you an option, VARDEF, which may help here.

proc means data=test var vardef=WDF;
var x;
weight wt;
run;

That on your dataset gives a variance similar to r. Both are 'right', depending on how you choose to calculate the weighted variance. (At my shop we calculate it a third way, of course...)

Complete text from PROC MEANS documentation:

VARDEF=divisor specifies the divisor to use in the calculation of the variance and standard deviation. The following table shows the possible values for divisor and associated divisors.

Possible Values for VARDEF=
Value            Divisor                     Formula for Divisor
DF               degrees of freedom          n - 1
N                number of observations      n
WDF              sum of weights minus one    ([Sigma]iwi) - 1
WEIGHT | WGT     sum of weights              [Sigma]iwi

The procedure computes the variance as CSS/Divisor, where CSS is the corrected sums of squares and equals Sum((Xi-Xbar)^2). When you weight the analysis variables, CSS equals sum(Wi*(Xi-Xwbar)^2), where Xwbar is the weighted mean.

Default: DF Requirement: To compute the standard error of the mean, confidence limits for the mean, or the Student's t-test, use the default value of VARDEF=.

Tip: When you use the WEIGHT statement and VARDEF=DF, the variance is an estimate of Sigma^2, where the variance of the ith observation is Sigma^2/wi and wi is the weight for the ith observation. This method yields an estimate of the variance of an observation with unit weight.

Tip: When you use the WEIGHT statement and VARDEF=WGT, the computed variance is asymptotically (for large n) an estimate of Sigma^2/wbar, where wbar is the average weight. This method yields an asymptotic estimate of the variance of an observation with average weight.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top