Question

Some R datasets can be loaded into a Pandas DataFrame or Panel quite easily:

import pandas.rpy.common as com
infert = com.load_data('infert')
print(infert.head())

This appears to work as long as the dimension of the R dataset is <= 3. Higher dimensional datasets print an error message:

In [67]: com.load_data('Titanic')
Cannot handle dim=4

This error message originates in the rpy/common.py _convert_array function.

Sure, it makes sense that Pandas can not directly shoe-horn a 4-dimensional matrix into a DataFrame or Panel, but is there some workaround to load datasets like Titanic into a DataFrame (maybe with a hierarchical index)?

Was it helpful?

Solution 2

With Pandas version 0.13.0 or newer, pandas.rpy.common.load_data can load higher dimensional datasets such as Titanic:

import pandas.rpy.common as com
df = com.load_data('Titanic')
print(df.head())

yields

  Survived    Age     Sex Class value
0       No  Child    Male   1st   0.0
1       No  Child    Male   2nd   0.0
2       No  Child    Male   3rd  35.0
3       No  Child    Male  Crew   0.0
4       No  Child  Female   1st   0.0

OTHER TIPS

Using @joran's very helpful suggestion, after installing the reshape package with

% sudo R
R> install.packages('reshape')

I managed to load the Titanic dataset into a Pandas DataFrame with:

import pandas as pd
import pandas.rpy.common as com
import rpy2.robjects as ro

r = ro.r
r('library(reshape)')
df = com.convert_robj(r('melt(Titanic)'))
print(df.head())

which printed

  Class     Sex    Age Survived  value
1   1st    Male  Child       No      0
2   2nd    Male  Child       No      0
3   3rd    Male  Child       No     35
4  Crew    Male  Child       No      0
5   1st  Female  Child       No      0
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top