Pulling variable names when using pandas and statsmodels
-
24-06-2021 - |
Question
I'm trying to access the names of variables from the results generated by statsmodels
. I'll elaborate more after the example code.
import scikits.statsmodels.api as sm
import pandas as pd
data = sm.datasets.longley.load()
df = pd.DataFrame(data.exog, columns=data.exog_name)
y = data.endog
df['intercept'] = 1.
olsresult = sm.OLS(y, df).fit()
This summary output includes the variable names. When you call something like olsresult.params it returns the following:
In [21]: olsresult.params
Out[21]:
GNPDEFL 15.061872
GNP -0.035819
UNEMP -2.020230
ARMED -1.033227
POP -0.051104
YEAR 1829.151465
intercept -3482258.634596
Now what I'm curious about doing is creating something like a dictionary with the variable name as a key and the parameter value as the value. So, something like {'GNPDELF':15.0618, 'GNP':-0.035819}
and so on. If it's impossible to do this, is there any other way to access the variable name and value individually?
Solution
It's always worth trying the obvious.. :^)
In [14]: olsresult.params
Out[14]:
GNPDEFL 15.061872
GNP -0.035819
UNEMP -2.020230
ARMED -1.033227
POP -0.051104
YEAR 1829.151465
intercept -3482258.634597
In [15]: dict(olsresult.params)
Out[15]:
{'ARMED': -1.0332268671737328,
'GNP': -0.035819179292614578,
'GNPDEFL': 15.061872271452557,
'POP': -0.051104105653539733,
'UNEMP': -2.0202298038172479,
'YEAR': 1829.151464613984,
'intercept': -3482258.6345966831}
See also the .to_dict()
method of Series
objects.
OTHER TIPS
olsresult.params
is a pandas.Series object which is dict like, maybe you don`t need to convert to a dict.
In [12]: olsresult.params.get('GNP')
Out[12]: -0.035819179292566283
In [13]: olsresult.params['GNP']
Out[13]: -0.035819179292566283
In [14]: for key, value in olsresult.params.iteritems():
....: print key, value
....:
GNPDEFL 15.0618722714
GNP -0.0358191792926
UNEMP -2.02022980382
ARMED -1.03322686717
POP -0.0511041056537
YEAR 1829.15146461
intercept -3482258.6346