Regression summary output: Order of categories
-
22-12-2019 - |
Question
This question is about the way the result of a GLM is printed, that is, the order in which the coefficients are printed. By "order" I'm not referring to any statistical meaning of this term.
The following code determines a linear model:
from pandas import *
import statsmodels.api as sm
import patsy as patsy
df = read_csv("http://vincentarelbundock.github.io/Rdatasets/csv/ggplot2/diamonds.csv")
y, X = patsy.dmatrices( 'price ~ cut', data = df )
sm.GLM( y, X, family= sm.families.Gaussian() ).fit().summary()
... And produces the output below, in which the categories are ordered:
(Fair), Good, Ideal, Premium, Very Good
====================================================================================
coef std err z P>|z| [95.0% Conf. Int.]
------------------------------------------------------------------------------------
Intercept 4358.7578 98.788 44.122 0.000 4165.137 4552.379
cut[T.Good] -429.8933 113.849 -3.776 0.000 -653.034 -206.753
cut[T.Ideal] -901.2158 102.412 -8.800 0.000 -1101.939 -700.493
cut[T.Premium] 225.4999 104.395 2.160 0.031 20.889 430.111
cut[T.Very Good] -376.9979 105.164 -3.585 0.000 -583.116 -170.880
====================================================================================
What I'm trying to do:
I would like them to be ordered like:
(Fair), Good, Very Good, Premium, Ideal
What I'm trying to do would look like this in R:
df = read.table( file = "http://vincentarelbundock.github.io/Rdatasets/csv/ggplot2/diamonds.csv",
sep = ",", header = TRUE)
df$cut = factor( df$cut, levels = c("Fair", "Good", "Very Good", "Premium", "Ideal"))
glm( price ~ cut, data = df, family = gaussian )
Notice the ordering in the output follows the factor ordering:
(Fair), Good, Very Good, Premium, Ideal
Call: glm(formula = price ~ cut, family = gaussian, data = df)
Coefficients:
(Intercept) cutGood cutVery Good cutPremium cutIdeal
4358.8 -429.9 -377.0 225.5 -901.2
How do I do this in Python?
Solution
This is a known issue. I'm sure a PR would be welcome. Maybe continue the conversation here?
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow