Pregunta

import statsmodels.formula.api as sm
import numpy as np
import pandas

url = "http://vincentarelbundock.github.com/Rdatasets/csv/HistData/Guerry.csv"
df = pandas.read_csv(url)
df = df[['Lottery', 'Literacy', 'Wealth', 'Region']].dropna()
print df.head()
mod = sm.ols(formula='Lottery ~ Literacy + Wealth + Region', data=df)
res = mod.fit()
print res.summary()

Spits back this ERROR after printing the table.

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-4-f69caff21ed0> in <module>()
6 df = df[['Lottery', 'Literacy', 'Wealth', 'Region']].dropna()
7 print df.head()
----> 8 mod = sm.ols(formula='Lottery ~ Literacy + Wealth + Region', data=df)
9 res = mod.fit()
10 print res.summary()

TypeError: from_formula() takes at least 3 arguments (2 given)

This does not seem like acceptable behavior. What am I doing wrong?

¿Fue útil?

Solución

(The guess in my comment was wrong)

Your version of statsmodels is too old. The documentation and example is correct for the released version of statsmodels 0.5.

The data keyword has been renamed from df since 0.5.0.dev-1bbd4ca.

So either you upgrade, which I highly recommend, or you use the old keyword name

mod = sm.ols(formula='Lottery ~ Literacy + Wealth + Region', df=df)

should work with the version that you have.

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top