The key here is that in the help file for multinom()
it says that "A log-linear model is fitted, with coefficients zero for the first class."
So that means the predicted values for the reference class can be calculated directly assuming that the coefficients for class "a" are both zero. For example, for the sample row given above, we could calculate the predicted probability for class "a" using the softmax transform:
exp(0+0)/(exp(0+0) + exp(0.1756835 + 0.55915795*(-0.3271010)) + exp(-0.2513414 + (-0.31274745)*(-0.3271010)) + exp(0.1389806 + (-0.12257963)*(-0.3271010)) + exp(-0.4034968 + 0.06814379*(-0.3271010)))
or perhaps more simply, using non-hard-coded numbers, we can calculate the entire set of probabilities for the first row of data as:
softMax <- function(x){
expx <- exp(x)
return(expx/sum(expx))
}
coefs <- rbind(c(0,0), coef(model))
linear.predictor <- as.vector(coefs%*%c(1,-0.3271010))
softMax(linear.predictor)
FWIW: the example in the original question does not reproduce for me exactly, my seed gives different random deviates. So I have reproduced the example freshly and with my calculations below.
library(nnet)
set.seed(1)
DF <- data.frame(
X = as.numeric(rnorm(30)),
Y = factor(sample(letters[1:5],30, replace=TRUE)))
DF$Y<-relevel(DF$Y, ref="a") #ensure a is the reference category
model <- multinom(Y ~ X, data = DF)
coef(model)
## (Intercept) X
## b -0.33646439 1.200191e-05
## c -0.36390688 -1.773889e-01
## d -0.45197598 1.049034e+00
## e -0.01418543 3.076309e-01
DF[1,]
## X Y
## 1 -0.6264538 c
fitted.values(model)[1,]
## a b c d e
## 0.27518921 0.19656378 0.21372240 0.09076844 0.22375617
coefs <- rbind(c(0,0), coef(model))
linear.predictor <- as.vector(coefs%*%c(1,DF[1,"X"]))
softMax(linear.predictor)
## [1] 0.27518921 0.19656378 0.21372240 0.09076844 0.22375617