Question

I have a Rpy2 data frame as <class 'rpy2.robjects.vectors.DataFrame'>. How can I convert it to a Python list or tuple with every row as an element? Thanks!

Was it helpful?

Solution

I figured it out. I hope this helps if you are looking for an answer:

output = [tuple([df[j][i] for j in range(df.ncol)]) for i in range(df.nrow)]

OTHER TIPS

I stumbled recently over one potential problem. Given a data frame from R:

|   |   a   | c | b |  d  |
|---|-------|---|---|-----|
| 1 | info1 | 2 | 1 | op1 |
| 2 | info2 | 3 | 2 | 3   |
| 3 | info3 | 4 | 3 | 3   |
| 4 | info4 | 5 | 4 | 3   |
| 5 | info5 | 6 | 5 | 3   |
| 6 | info6 | 7 | 6 | 3   |
| 7 | 9     | 8 | 7 | 3   |

(yes I know - mixed data types in one column i.e. str and float is maybe not realistic but the same holds true for factors only columns)

The conversion will show the index for columns a and d and not the real values usually intended. The issue is as stated in the rpy2 manual:

R’s factors are somewhat peculiar: they aim at representing a memory-efficient vector of labels, and in order to achieve it are implemented as vectors of integers to which are associated a (presumably shorter) vector of labels. Each integer represents the position of the label in the associated vector of labels.

The following rough draft code is a step towards handling this case:

colnames = list(dataframe.colnames)
rownames=list(dataframe.rownames)
col2data = []
for cn,col in dataframe.items():
    if isinstance(col,robjects.vectors.FactorVector) is True:
        colevel = tuple(col.levels)
        col = tuple(col)
        ncol = []
        for i in col:
            k=i-1
            ncol.append(colevel[k])
    else:
        ncol = tuple(col)
    col2data.append((cn,ncol))

col2data.append(('rownames',rownames))
col2data = dict(col2data)

The output is a dict with columnames to values mapping. Using a loop and transposing the list of lists will generate the output as needed.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top