Вопрос

I am trying to calculate statistical parameters phi coefficient, Cramer's V and Contigency Coefficient using Rpy module of python. In R I am able to do so but I am at my wits end in my attempts to replicate the same in python

Library(vcd)
data <- read.csv("test.csv")
assocstats(table(data$var_4, data$target)

Output     
                X^2 df P(> X^2)
Likelihood Ratio 113.28  1        0
Pearson          112.51  1        0

Phi-Coefficient   : 0.15 
Contingency Coeff.: 0.148 
Cramer's V        : 0.15 

Implementation in python

from Rpy import r
# Already connected with mysql
q="Select var_4 , target from test"
cur.execute(q)
data=cur.fetchall()
ls1=[]
ls2=[]
for i in range(len(data)):
  ls1.append(data[i][0])
  ls2.append(data[i][1])
rpy.r.library("vcd")
rpy.r.assocstats(rpy.r.table(ls1,ls2))

error :

Traceback (most recent call last):
File "<pyshell#14>", line 1, in <module>
rpy.r.assocstats(rpy.r.table(ls1,ls2))
RPy_RException: Error in sort.list(y) : 'x' must be atomic for 'sort.list'
Have you called 'sort' on a list?

The other way I am trying is to calculate the phi sq from scipy module and then use the mathematical formula to calculate cramer's v etc. But I intend to use Rpy heavily in my project going forward.I would really appreciate I you can point out the problem in above approach . I think I am not able to pass on the input in proper format in the formula Thanks in Advance

Это было полезно?

Решение

From the error we can see that sort function has issues with list input. Testing this case for a sample list

templist<-list(c(3,2,1))
> sort(templist)
Error in sort.int(x, na.last = na.last, decreasing = decreasing, ...) : 
  'x' must be atomic

newlist<-unlist(templist)
>is.atomic(newlist)
[1] TRUE

> sort(newlist)
[1] 1 2 3

The key here is unlist, You can confirm if your inputs ls1 and ls2 are list(s) using, rpy.r.is.list. To unlist them, rpy.r.unlist needs to be called on both ls1 and ls2.

To be able use functions with . in the function name such as is.list(), you could refer to (Accessing functions with a dot in theior name (eg. "as.vector") using rpy2)

I do not have rpy hence cannot confirm but I suppose this should work, let us know.

Другие советы

Are you really using rpy2 as you stated in the tags? Looks to be rpy to me. Anyway, I strongly recommend you to migrate to rpy2 if you haven't done so.

Looks like your ls1 ls2 are just lists of numbers, the problem should be a very simple one:

In [60]:
#setting up
import rpy2.robjects as ro
mydata = ro.r['data.frame']
table = ro.r['table']
assocstats = ro.r['assocstats']
summary = ro.r['summary']
ro.r['library']('vcd')
ls1=np.random.random(50)
ls2=np.random.random(50)
result=assocstats(table(ls1, ls2))

In [61]:
#what is in the result
print result.names
[1] "table"       "chisq_tests" "phi"         "contingency" "cramer"     

In [62]:
#access the chi-sqaure table
print result.rx('chisq_tests')
$chisq_tests
                       X^2   df  P(> X^2)
Likelihood Ratio  391.2023 2401 1.0000000
Pearson          2450.0000 2401 0.2382456
Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top