Background: When we test the significance of a categorical variable that has been coded as dummy variables, we need to simultaneously test all dummy variables are 0. For example, if X takes on values of 0, 1, 2, 3 and 4, I would fit dummy variables for levels 1-4 (assuming I want 0 to be baseline), then want to simultaneously test B1=B2=B3=B4=0.
If this is the only variable in my data set, I can use the overall F-statistic to achieve this. However, if I have other covariates, the overall F-test doesn't work.
In Stata, for example, this is (very, very) simply carried out by the testparm
command as:
testparm i.x
(after fitting the desired regression model), where the i.
prefix tells Stata X is a categorical data to be treated as dummy variables.
Question/issue: I'm wondering how I can do this in SAS with a CONTRAST
(or ESTIMATE
?) statement while fitting a regression model with PROC GLM.
Since I have scoured the internet and haven't found what I'm looking for, I'm guessing I'm missing something very obvious. However, all of the examples I've seen are NOT for categorical (class
) variables, but rather two separate (say continuous) variables. The contrast statement in that case would simply be something like
CONTRAST 'Contrast1' y 1 z 1;
Otherwise, they're for calculating hypotheses like H_0: B1-B2=0.
I feel like I need to breakdown the hypotheses into smaller pieces and determine that set that defines the whole relationship, but I'm not doing it correctly. For example, for B1=B2=B3=B4=0, I thought I might say B1=B2=B3=-B4, then define (1) B1=-B4, (2) B2=-B4 and (3) B2=B3. I was trying to code this as a CONTRAST
statement as (say X is in descending order in data set: 4-0):
CONTRAST 'Contrast' x -1 0 0 1 0
x -1 0 1 0 0
x 0 1 1 0 0;
I know this is not correct, and I tried many, many variations and whatever random logic I could come up with. My problem is I have relatively novice-level knowledge of CONTRAST
(and unfortunately have not found great documentation to help with this) and also of how this hypothesis test should really be formulated for the sake of estimation (do I try to split it up into pieces as I did above, or...?).