Вопрос

I'm trying to use ryp2 to do a logistic regression. I managed to execute it, but don't know how to extract the coefficients and p-values from the result. I don't want to print the values on the screen bu create a function to use them independently.

import rpy2.robjects as ro
mydata = ro.r['data.frame']
read = ro.r['read.csv']
head = ro.r['head']
summary = ro.r['summary']

mydata = read("http://www.ats.ucla.edu/stat/data/binary.csv")
#cabecalho = head(mydata)
formula = 'admit ~ gre + gpa + rank'
mylogit = ro.r.glm(formula=ro.r(formula), data=mydata,family=ro.r('binomial(link="logit")'))
#What NEXT?
Это было полезно?

Решение

I don't known how you can get the p-values, but for any others it should be something like this:

In [24]:
#what is stored in mylogit?
mylogit.names
Out[24]:
<StrVector - Python:0x10a01a0e0 / R:0x10353ab20>

['coef..., 'resi..., 'fitt..., ..., 'meth..., 'cont..., 'xlev...]
In [25]:
#looks like the first item is the coefficients
mylogit.names[0]
Out[25]:
'coefficients'
In [26]:
#OK, let's get the the coefficients.
mylogit[0]
Out[26]:
<FloatVector - Python:0x10a01a5f0 / R:0x1028bcc80>
[-3.449548, 0.002294, 0.777014, -0.560031]
In [27]:
#be careful that the index from print is R index, starting with 1. I don't see p values here
print mylogit.names
 [1] "coefficients"      "residuals"         "fitted.values"    
 [4] "effects"           "R"                 "rank"             
 [7] "qr"                "family"            "linear.predictors"
[10] "deviance"          "aic"               "null.deviance"    
[13] "iter"              "weights"           "prior.weights"    
[16] "df.residual"       "df.null"           "y"                
[19] "converged"         "boundary"          "model"            
[22] "call"              "formula"           "terms"            
[25] "data"              "offset"            "control"          
[28] "method"            "contrasts"         "xlevels"   

Edit

The P values for each terms:

In [55]:
#p values:
list(summary(mylogit)[-6])[-4:]
Out[55]:
[0.0023265825120094407,
 0.03564051883525258,
 0.017659683902155117,
 1.0581094283250368e-05]

And:

In [56]:
#coefficients 
list(summary(mylogit)[-6])[:4]
Out[56]:
[-3.449548397668471,
 0.0022939595044433334,
 0.7770135737198545,
 -0.5600313868499897]
In [57]:
#S.E.
list(summary(mylogit)[-6])[4:8]
Out[57]:
[1.1328460085495897,
 0.001091839095422917,
 0.327483878497867,
 0.12713698917130048]
In [58]:
#Z value
list(summary(mylogit)[-6])[8:12]
Out[58]:
[-3.0450285137032984,
 2.1010050968680347,
 2.3726773277632214,
 -4.4049445444662885]

Or more generally:

In [60]:

import numpy as np
In [62]:

COEF=np.array(summary(mylogit)[-6]) #it has a shape of (number_of_terms, 4)
In [63]:

COEF[:, -1] #p-value
Out[63]:
array([  2.32658251e-03,   3.56405188e-02,   1.76596839e-02,
         1.05810943e-05])
In [66]:

COEF[:, 0] #coefficients
Out[66]:
array([ -3.44954840e+00,   2.29395950e-03,   7.77013574e-01,
        -5.60031387e-01])
In [68]:

COEF[:, 1] #S.E.
Out[68]:
array([  1.13284601e+00,   1.09183910e-03,   3.27483878e-01,
         1.27136989e-01])
In [69]:

COEF[:, 2] #Z
Out[69]:
array([-3.04502851,  2.1010051 ,  2.37267733, -4.40494454])

You can also summary(mylogit).rx2('coefficient') (or rx), if you know that coefficient is in the summary vector.

Другие советы

This isn't quite an answer to what you asked, but if your question is more generally "how to move a logistic regression to Python", why not use statsmodels?

import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf

df = pd.read_csv("http://www.ats.ucla.edu/stat/data/binary.csv")
model = smf.glm('admit ~ gre + gpa + rank', df, family=sm.families.Binomial()).fit()
print model.summary()

This prints:

                 Generalized Linear Model Regression Results                  
==============================================================================
Dep. Variable:                  admit   No. Observations:                  400
Model:                            GLM   Df Residuals:                      396
Model Family:                Binomial   Df Model:                            3
Link Function:                  logit   Scale:                             1.0
Method:                          IRLS   Log-Likelihood:                -229.72
Date:                Sat, 29 Mar 2014   Deviance:                       459.44
Time:                        11:56:19   Pearson chi2:                     399.
No. Iterations:                     5                                         
==============================================================================
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
Intercept     -3.4495      1.133     -3.045      0.002        -5.670    -1.229
gre            0.0023      0.001      2.101      0.036         0.000     0.004
gpa            0.7770      0.327      2.373      0.018         0.135     1.419
rank          -0.5600      0.127     -4.405      0.000        -0.809    -0.311
==============================================================================

While there are still some statistical procedures that only have a good implementation in R, for straightforward things like linear models, it's probably a lot easier to use statsmodels than to fight with RPy2, since all of the introspection, built-in documentation, tab completion (in IPython), etc. will work directly on statsmodels objects.

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top