There's an issue for this open. Please discuss alternatives there.
Specifying the form of names for categories generated by patsy/statsmodels 'C'
-
15-06-2023 - |
Question
By default, Patsy's C
seems to generate categories with names of the form
C(color, Treatment('White'))[T.Green]
at least when used in a formula provided to statsmodels
old
. Is there a way to specify that C
generate less verbose category names, e.g., of the form
colorGreen
or even simply
Green
Solution
OTHER TIPS
Bit late to the party but for those searching this in 2021.
If you're prepared todo a bit of wrangling, you can take apart the statsmodel Summary
object (returned when calling summary()
on a fitted model), convert it to a DataFrame, and format it from there.
The Summary
object has a tables
attribute. The first is the result of the fit, the second is the coefficents table. The tables have an as_html()
method that you can pass to the pandas read_html()
method.
df = pd.read_html(your_fitted_model.summary().tables[1].as_html(), header=0)[0]
From there you can strip out the patsy formatting via regular string and dataframe methods.