Question

I have got a question regarding the ordinal package in R or specifically regarding the predict.clm() function. I would like to calculate the linear predictor of an ordered probit estimation. With the polr function of the MASS package the linear predictor can be accessed by object$lp. It gives me on value for each line and is in line with what I understand what the linear predictor is namely X_i'beta. If I however use the predict.clm(object, newdata,"linear.predictor") on an ordered probit estimation with clm() I get a list with the elements eta1 and eta2,

  1. with one column each, if the newdata contains the dependent variable
  2. where each element contains as many columns as levels in the dependent variable, if the newdata doesn't contain the dependent variable

Unfortunately I don't have a clue what that means. Also in the documentations and papers of the author I don't find any information about it. Would one of you be so nice to enlighten me? This would be great.

Cheers,

AK

Was it helpful?

Solution

UPDATE (after comment):

Basic clm model is defined like this (see clm tutorial for details):

Generating data:

library(ordinal)
set.seed(1)
test.data = data.frame(y=gl(4,5),
                       x=matrix(c(sample(1:4,20,T)+rnorm(20), rnorm(20)), ncol=2))
head(test.data) # two independent variables 
test.data$y     # four levels in y

Constructing models:

fm.polr <- polr(y ~ x) # using polr
fm.clm  <- clm(y ~ x)  # using clm

Now we can access thetas and betas (see formula above):

# Thetas
fm.polr$zeta # using polr
fm.clm$alpha # using clm
# Betas
fm.polr$coefficients # using polr
fm.clm$beta          # using clm

Obtaining linear predictors (only parts without theta on the right side of the formula):

fm.polr$lp                                                 # using polr
apply(test.data[,2:3], 1, function(x) sum(fm.clm$beta*x))  # using clm

New data generation:

# Contains only independent variables
new.data <- data.frame(x=matrix(c(rnorm(10)+sample(1:4,10,T), rnorm(10)), ncol=2))
new.data[1,] <- c(0,0)  # intentionally for demonstration purpose
new.data

There are four types of predictions available for clm model. We are interested in type=linear.prediction, which returns a list with two matrices: eta1 and eta2. They contain linear predictors for each observation in new.data:

lp.clm <- predict(fm.clm, new.data, type="linear.predictor")
lp.clm

Note 1: eta1 and eta2 are literally equal. Second is just a rotation of eta1 by 1 in j index. Thus, they leave left side and right side of linear predictor scale opened respectively.

all.equal(lp.clm$eta1[,1:3], lp.clm$eta2[,2:4], check.attributes=FALSE)
# [1] TRUE

Note 2: Prediction for first line in new.data is equal to thetas (as far as we set this line to zeros).

all.equal(lp.clm$eta1[1,1:3], fm.clm$alpha, check.attributes=FALSE)
# [1] TRUE

Note 3: We can manually construct such predictions. For instance, prediction for second line in new.data:

second.line <- fm.clm$alpha - sum(fm.clm$beta*new.data[2,])
all.equal(lp.clm$eta1[2,1:3], second.line, check.attributes=FALSE)
# [1] TRUE

Note 4: If new.data contains response variable, then predict returns only linear predictor for specified level of y. Again we can check it manually:

new.data$y <- gl(4,3,length=10)
lp.clm.y <- predict(fm.clm, new.data, type="linear.predictor")
lp.clm.y

lp.manual <- sapply(1:10, function(i) lp.clm$eta1[i,new.data$y[i]])
all.equal(lp.clm.y$eta1, lp.manual)
# [1] TRUE
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top