Question

I hope this is an easy question but I'm having trouble creating SPSS syntax for it.

I have a dataset with a single variable and about 200 cases. I need to compute the mean of that variable, but I need to compute the mean 200 times such that it is computed once with each case removed. So the mean needs to be computed 200 times, removing each case once (and then replacing it) and calculating the mean with that case missing. In other words, the first time I compute the mean it should exclude the first case (so cases 2 through 200 are analyzed). The second time I compute the mean it should exclude the second case but include the first case (so cases 1 and 3 through 200 are analyzed). And so on.

Ideally what I would like to do is create a new SPSS dataset, such that the only variable in this new dataset contains these 200 means. I believe the best way to do this is through the aggregate function.

What I am having trouble with is how to remove each case, compute the mean, replace the case, compute the mean again with another case removed, and so on. I could do this with a filter, but I would like to automate it rather than having to copy/past or change the syntax each time. I am thinking some kind of repeating filter, but I am not very familiar with repeat and loop commands (but working on it...).

Any insight or help about the best way to create a filter like this would be much appreciated

Was it helpful?

Solution

I was correct in my comment that you can levy the use of the deletion statistics available in the REGRESSION procedure to get the info you need without having to loop through the dataset yourself.

What you have to do is calculate your own constant value of 1 and force the REGRESSION through the origin (as SPSS does not let you specify an empty regression equation) predicting your variable of interest. Then have the regression procedure save the deletion residuals. The difference between these deletion residuals and your original variable are the jackknifed means with that observation deleted.

So in a nutshell this code would provide that info - just replace X with your variable of interest.

COMPUTE Const = 1.
REGRESSION
  /ORIGIN 
  /DEPENDENT X
  /METHOD=ENTER Const
  /SAVE DRESID (MeanResid).
COMPUTE JackknifeMeanX = X - MeanResid.

Full example (with fake data and checking via aggregate) is below:

INPUT PROGRAM.
LOOP Id = 1 TO 10.
END CASE.
END LOOP.
END FILE.
END INPUT PROGRAM.
DATASET NAME Sim.
COMPUTE X = RV.NORMAL(10,5).
COMPUTE Const = 1.
FORMATS Id Const (F2.0).
EXECUTE.

*Using deletion residuals in linear regression to calculate Jackknifed mean.
*Here I calculate my own intercept and force through origin.
REGRESSION
  /ORIGIN 
  /DEPENDENT X
  /METHOD=ENTER Const
  /SAVE DRESID (MeanResid).
COMPUTE JackknifeMeanX = X - MeanResid.

*Checking to make sure this agrees with data.
VECTOR XMis(10).
LOOP #i = 1 TO 10.
  IF $casenum <>#i XMis(#i) = X.
END LOOP.
AGGREGATE OUTFILE = * OVERWRITE=YES MODE=ADDVARIABLES
  /BREAK
  /XMis1 TO XMis10=MEAN(Xmis1 TO XMis10).

OTHER TIPS

Following some of the discussion over on the SPSS Google Group to this same question you posed I wrote a MACRO to compute the jackknifed means and variances based on this advice.

*This functiona calculates the jackknifed mean and variance.
*It also returns the total mean (GrandMean) and total variance (GrandVar).
*All variance calculations use population type (N-1) calculations.

*The parameters it takes are:.
 *Var - Original variable you want calculated.
 *JMean - name of the resulting jackknifed mean (DEFAULT JackMean).
 *VarCalc - flag for if you want the second data pass to calculate Jackknifed variance
            can take either Yes or Y (case does not matter) default is No.
 *JVar - name of the resulting jackknifed variance (DEFAULT JackVar).

DEFINE !JackMeanVar (Var = !TOKENS(1)
                    /JMean = !DEFAULT (JackMean) !TOKENS(1)
                    /VarCalc = !DEFAULT (No) !TOKENS(1)
                    /JVar = !DEFAULT (JackVar) !TOKENS(1) )
*Calculate grand mean and N.
AGGREGATE OUTFILE=* MODE=ADDVARIABLES
  /BREAK
  /GrandSum=SUM(!Var)
  /GrandMean=MEAN(!Var)
  /TotalN=N. 
*Compute Jackknife mean.
COMPUTE !JMean=(GrandSum-!Var)/(TotalN - 1).
*Compute grand contribution to variance.
!IF (!UPCASE(!VarCalc)="YES" !OR !UPCASE(!VarCalc)="Y") !THEN
COMPUTE Vi = !Var**2.
AGGREGATE OUTFILE=* MODE=ADDVARIABLES
  /BREAK
  /GrandVar=SD(!Var)
  /GrandV=SUM(Vi).
*Computing full set variance (population).
COMPUTE GrandVar = GrandVar**2.
*COMPUTE GVar = (GrandV-(GrandSum**2/TotalN))/(TotalN - 1).
*Subtract out local contribution.
COMPUTE !JVar= ((GrandV - Vi) - (GrandSum-!Var)**2/(TotalN -1))/(TotalN - 2).
*Clean Up.
MATCH FILES FILE = * /DROP Vi GrandV.
!IFEND
*Clean Up.
MATCH FILES FILE = * /DROP GrandSum TotalN.
VARIABLE LABELS 
  GrandMean 'Mean for Total Population'
  !JMean 'Mean with this observation left out'
  GrandVar 'Variance (Population) for Total Population'
  !JVar 'Variance (Population) with this observation left out'
  .
!ENDDEFINE.

An example of this function in use - and a more roundabout way to check the calculations using SPSS's aggregate function are below.

*Test it out.
DATA LIST FREE / X.
BEGIN DATA
1 1 1 2 2 2 3 3 3 4 4 4
END DATA.

!JackMeanVar Var=X JMean = MeanJ VarCalc=Yes JVar = VarJ.
EXECUTE.

*Checking calculations.
VECTOR CheckM(12).
LOOP #i = 1 TO 12.
  IF $casenum<>#i CheckM(#i)=X.
END LOOP.
AGGREGATE
  /OUTFILE=* MODE=ADDVARIABLES OVERWRITE=YES
  /BREAK=
  /CheckM1 TO CheckM12=MEAN(CheckM1 TO CheckM12)
  /CheckV1 TO CheckV12=SD(CheckM1 TO CheckM12).
VECTOR CheckM = CheckM1 TO CheckV12.
VECTOR CheckV = CheckV1 TO CheckV12.
LOOP #i = 1 TO 12.
  DO IF $casenum = #i. 
    COMPUTE MeanCheck = CheckM(#i).
    COMPUTE VarCheck = CheckV(#i)**2.
  END IF.
END LOOP.
MATCH FILES FILE = * /DROP CheckM1 TO CheckV12.
EXECUTE.

I suggest a simple workaround: first add the means of all 200 cases to the file, then re-calculate the mean for every case while removing the value of that case:

DATA LIST FREE / OrigVar.
BEGIN DATA
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
END DATA.

AGGREGATE   /OUTFILE=* MODE=ADDVARIABLES  /BREAK=   /meanAll=MEAN(OrigVar)/Ncases=n.
compute MeanWithoutThisVal= (Ncases * meanAll - OrigVar) / (Ncases - 1).
exe.

This example only has 30 cases but the syntax will work with any number of cases.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top