Assesing the goodness of fit for the multinomial logit in R with the nnet package

https://stackoverflow.com//questions/23018238

21-12-2019
|

سؤال

I use the multinom() function from the nnet package to run the multinomial logistic regression in R. The nnet package does not include p-value calculation and t-statistic calculation. I found a way to calculate the p-values using the two tailed z-test from this page. To give one example of calculating a test statistic for a multinom logit (not really a t-stat, but an equivalent) I calculate the Wald's statistic:

mm<-multinom(Empst ~ Agegroup + Marst + Education + State, 
             data = temp,weight=Weight)
W <- (summary(mm1)$coefficients)^2/(summary(mm1)$standard.errors)^2

I take the square of a coefficient and divide by the square of the coefficient's standard error. However, the likelihood-ratio test is the preferable measure of a goodness of fit for the logistic regressions. I do not know how to write code that will calculate the likelihood ratio statistic for each coefficient due to the incomplete understanding of the likelihood function. What would be the way to calculate the likelihood-ratio statistic for each coefficient using the output from the multinom() function? Thanks for your help.

المحلول

Let's look at predicting Sepal.Length from the iris dataset using Species (a categorical variable) and Petal.Length (a continuous variable). Let's start by converting our factor variable into multiple binary variables using model.matrix and building our neural network:

library(nnet)
data(iris)
mat <- as.data.frame(model.matrix(~Species+Petal.Length+Sepal.Length, data=iris))
mm <- multinom(Sepal.Length~.+0, data=mat, trace=F)

Now we can run a likelihood ratio test for a variable in our model:

library(lmtest)
lrtest(mm, "Speciesversicolor")
# Likelihood ratio test
# 
# Model 1: Sepal.Length ~ `(Intercept)` + Speciesversicolor + Speciesvirginica + 
#     Petal.Length + 0
# Model 2: Sepal.Length ~ `(Intercept)` + Speciesvirginica + Petal.Length - 
#     1
#   #Df  LogLik  Df  Chisq Pr(>Chisq)
# 1 136 -342.02                      
# 2 102 -346.75 -34 9.4592          1

To run the likelihood ratio test for all your variables, I guess you could just use a loop and run for each variable name. I've extracted just the p-values in this loop.

for (var in mm$coefnames[-1]) {
  print(paste(var, "--", lrtest(mm, var)[[5]][2]))
}
# [1] "Speciesversicolor -- 0.999990077592342"
# [1] "Speciesvirginica -- 0.998742545590864"
# [1] "Petal.Length -- 3.36995663002528e-14"

نصائح أخرى

Use the Anova function in the car package for the likelihood-ratio test of each term in your model.

library(nnet)
data(iris)


mm <- multinom(Species ~ ., data=iris, trace=F)

### car package
library(car)
Anova(mm)

مرخصة بموجب: CC-BY-SA مع الإسناد

لا تنتمي إلى StackOverflow