There's an issue for this open. Please discuss alternatives there.
Specifying the form of names for categories generated by patsy/statsmodels 'C'
-
15-06-2023 - |
문제
By default, Patsy's C
seems to generate categories with names of the form
C(color, Treatment('White'))[T.Green]
at least when used in a formula provided to statsmodels
old
. Is there a way to specify that C
generate less verbose category names, e.g., of the form
colorGreen
or even simply
Green
해결책
다른 팁
Bit late to the party but for those searching this in 2021.
If you're prepared todo a bit of wrangling, you can take apart the statsmodel Summary
object (returned when calling summary()
on a fitted model), convert it to a DataFrame, and format it from there.
The Summary
object has a tables
attribute. The first is the result of the fit, the second is the coefficents table. The tables have an as_html()
method that you can pass to the pandas read_html()
method.
df = pd.read_html(your_fitted_model.summary().tables[1].as_html(), header=0)[0]
From there you can strip out the patsy formatting via regular string and dataframe methods.