Question

By default, Patsy's C seems to generate categories with names of the form

C(color, Treatment('White'))[T.Green]

at least when used in a formula provided to statsmodels old. Is there a way to specify that C generate less verbose category names, e.g., of the form

colorGreen

or even simply

Green
Was it helpful?

Solution

There's an issue for this open. Please discuss alternatives there.

https://github.com/pydata/patsy/issues/19

OTHER TIPS

Bit late to the party but for those searching this in 2021.

If you're prepared todo a bit of wrangling, you can take apart the statsmodel Summary object (returned when calling summary() on a fitted model), convert it to a DataFrame, and format it from there.

The Summary object has a tables attribute. The first is the result of the fit, the second is the coefficents table. The tables have an as_html() method that you can pass to the pandas read_html() method.

df = pd.read_html(your_fitted_model.summary().tables[1].as_html(), header=0)[0]

From there you can strip out the patsy formatting via regular string and dataframe methods.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top