I was thinking about posting my question in Cross-Validated, but decided to come here. I am using the multinom() function from the nnet package to estimate the odds of becoming employed, unemployed, or out of labor force conditioned on age and education. I need some help with the interpretation.
I have the following dataset of one dependent categorical variable employment status(EmpSt) and two independent categorical variables: age (Age) and education level (Education).
>head(df)
EmpSt Age Education
1 Employed 61+ Less than a high school diploma
2 Employed 50-60 High school graduates, no college
3 Not in labor force 50-60 Less than a high school diploma
4 Employed 30-39 Bachelor's degree or higher
5 Employed 20-29 Some college or associate degree
6 Employed 20-29 Some college or associate degree
Here is the summary with the levels:
>summary(df)
EmpSt Age Education
Not in universe : 0 16-19: 6530 Less than a high school diploma :14686
Employed :61478 20-29:16031 High school graduates, no college:30716
Unemployed : 3940 30-39:16520 Some college or associate degree :28525
Not in labor force:38508 40-49:17403 Bachelor's degree or higher :29999
50-60:20779
61+ :26663
- First,what is the estimation equation(model)
I want to determine what is the estimation equation(model) for the call
df$EmpSt<-relevel(df$EmpSt,ref="Employed")
multinom(EmpSt ~ Age + Education,data=df)
so I can write it down in my research paper. In my understanding the Employed is the base level and the logit model for this call is:
where i and n are the categories of the variables age and education respectively (sorry for confusing notation). Please, correct me if my understanding of the logistic model produced by multinom() is incorrect. I am not going to include the summary of the test because it is a lot of output, so below I just include the the output for call >test
:
> test
Call:
multinom(formula = EmpSt ~ Age + Education, data = ml)
Coefficients:
(Intercept) Age20-29 Age30-39 Age40-49 Age50-60 Age61+
Unemployed -1.334734 -0.3395987 -0.7104361 -0.8848517 -0.9358338 -0.9319822
Not in labor force 1.180028 -1.2531405 -1.6711616 -1.6579095 -1.2579600 0.8197373
EducationHigh school graduates, no college EducationSome college or associate degree
Unemployed -0.4255369 -0.781474
Not in labor force -0.8125016 -1.004423
EducationBachelor's degree or higher
Unemployed -1.351119
Not in labor force -1.580418
Residual Deviance: 137662.6
AIC: 137698.6
Given that my understanding of the logit model produced by the multinom() is correct the coefficients are the logged odds where the base level is Employed. To get the actual odds I antilog by the call exp(coef(test))
which gives me the actual odds:
> exp(coef(test))
(Intercept) Age20-29 Age30-39 Age40-49 Age50-60 Age61+
Unemployed 0.2632281 0.7120560 0.4914298 0.4127754 0.3922587 0.3937724
Not in labor force 3.2544655 0.2856064 0.1880285 0.1905369 0.2842333 2.2699035
EducationHigh school graduates, no college EducationSome college or associate degree
Unemployed 0.6534189 0.4577308
Not in labor force 0.4437466 0.3662560
EducationBachelor's degree or higher
Unemployed 0.2589504
Not in labor force 0.2058891
which brings me to my next question.
- Second, the probabilities
I wonder if there is a way to get the actual probabilities of being unemployed vs employed based on the combination of age and education,e.g what is the probability of being unemployed if I am 22 and have a high school diploma. Sorry for the lengthy question. Thanks for your help. Let me know if additional clarification is needed.