Question

In my data file I select a random sample of a fixed size, by Select Cases. Say I have 400 cases, I randomly pick 150. All cases have a AGE and SEX value. I now want to test the AGE and SEX distribution of the sample (150 cases) against the AGE and SEX distribution of the rest (250 cases) and check if my sample is representative of the population.

My solution is to compute two new variables where I put the value in depending on sample or rest. Here for age:

IF (filter_$ EQ 1) sample_age = age.
IF (filter_$ EQ 0) rest_age = age.
EXECUTE .

How do I then perform a test on sample_age and rest_age? Which test would be appropriate?

the data looks like this:

person    sample_age    rest_age
1                 29           .
2                 56           .
3                  .          34
4                  .          12
5                 65           .
Was it helpful?

Solution

You should not make new variables with missing values. Presuming you have calculated the filter_$ variable that identifies the separate samples, for the continuous age variable you can estimate an independent samples t-test.

T-TEST GROUPS = filter_$ (1 0)
  /VARIABLES=age.

For sex which is categorical, you can run a CROSSTABS and calculate the chi-square statistic.

CROSSTABS 
  /TABLES = filter_$ BY sex 
  /STATISTICS=CHISQ.
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top