Question

I want to construct a data frame in an Rcpp function, but when I get it, it doesn't really look like a data frame. I've tried pushing vectors etc. but it leads to the same thing. Consider:

RcppExport SEXP makeDataFrame(SEXP in) {
    Rcpp::DataFrame dfin(in);
    Rcpp::DataFrame dfout;
    for (int i=0;i<dfin.length();i++) {
        dfout.push_back(dfin(i));
    }

    return dfout;
}

in R:

> .Call("makeDataFrame",mtcars,"myPkg")
[[1]]
 [1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 10.4
[16] 10.4 14.7 32.4 30.4 33.9 21.5 15.5 15.2 13.3 19.2 27.3 26.0 30.4 15.8 19.7
[31] 15.0 21.4

[[2]]
 [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4

[[3]]
 [1] 160.0 160.0 108.0 258.0 360.0 225.0 360.0 146.7 140.8 167.6 167.6 275.8
[13] 275.8 275.8 472.0 460.0 440.0  78.7  75.7  71.1 120.1 318.0 304.0 350.0
[25] 400.0  79.0 120.3  95.1 351.0 145.0 301.0 121.0

[[4]]
 [1] 110 110  93 110 175 105 245  62  95 123 123 180 180 180 205 215 230  66  52
[20]  65  97 150 150 245 175  66  91 113 264 175 335 109

[[5]]
 [1] 3.90 3.90 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 3.92 3.07 3.07 3.07 2.93
[16] 3.00 3.23 4.08 4.93 4.22 3.70 2.76 3.15 3.73 3.08 4.08 4.43 3.77 4.22 3.62
[31] 3.54 4.11

[[6]]
 [1] 2.620 2.875 2.320 3.215 3.440 3.460 3.570 3.190 3.150 3.440 3.440 4.070
[13] 3.730 3.780 5.250 5.424 5.345 2.200 1.615 1.835 2.465 3.520 3.435 3.840
[25] 3.845 1.935 2.140 1.513 3.170 2.770 3.570 2.780

[[7]]
 [1] 16.46 17.02 18.61 19.44 17.02 20.22 15.84 20.00 22.90 18.30 18.90 17.40
[13] 17.60 18.00 17.98 17.82 17.42 19.47 18.52 19.90 20.01 16.87 17.30 15.41
[25] 17.05 18.90 16.70 16.90 14.50 15.50 14.60 18.60

[[8]]
 [1] 0 0 1 1 0 1 0 1 1 1 1 0 0 0 0 0 0 1 1 1 1 0 0 0 0 1 0 1 0 0 0 1

[[9]]
 [1] 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 1 1 1 1 1 1 1

[[10]]
 [1] 4 4 4 3 3 3 3 4 4 4 4 3 3 3 3 3 3 4 4 4 3 3 3 3 3 4 5 5 5 5 5 4

[[11]]
 [1] 4 4 1 1 2 1 4 2 2 4 4 3 3 3 4 4 4 1 2 1 1 2 2 4 2 1 2 2 4 6 8 2
Was it helpful?

Solution

It seems Rcpp can return a proper data.frame, provided you supply the names explicitely. I'm not sure how to adapt this to your example with arbitrary names

mkdf <- '
    Rcpp::DataFrame dfin(input);
    Rcpp::DataFrame dfout;
    for (int i=0;i<dfin.length();i++) {
        dfout.push_back(dfin(i));
    }

    return Rcpp::DataFrame::create( Named("x")= dfout(1), Named("y") = dfout(2));
'
library(inline)
test <- cxxfunction( signature(input="data.frame"),
                              mkdf, plugin="Rcpp")

test(input=head(iris))

OTHER TIPS

Briefly:

  • DataFrames are indeed just like lists with the added restriction of having to have a common length, so they are best constructed column by column.

  • The best way is often to look for our unit tests. Her inst/unitTests/runit.DataFrame.R regroups tests for the DataFrame class.

  • You also found the .push_back() member function in Rcpp which we added for convenience and analogy with the STL. We do warn that it is not recommended: due to differences with the way R objects are constructed, we essentially always need to do full copies .push_back is not very efficient.

  • Despite me answering here frequently, the rcpp-devel list a better place for Rcpp questions.

Using the information from @baptiste's answer, this is what finally does give a well formed data frame:

RcppExport SEXP makeDataFrame(SEXP in) {
    Rcpp::DataFrame dfin(in);
    Rcpp::DataFrame dfout;
    Rcpp::CharacterVector namevec;
    std::string namestem = "Column Heading ";
    for (int i=0;i<2;i++) {
        dfout.push_back(dfin(i));
        namevec.push_back(namestem+std::string(1,(char)(((int)'a') + i)));
    }
    dfout.attr("names") = namevec;
    Rcpp::DataFrame x;
    Rcpp::Language call("as.data.frame",dfout);
    x = call.eval();
    return x;
}

I think the point remains that this might be inefficient due to push_back (as suggested by @Dirk) and the second Language call evaluation. I looked up the rcpp unitTests, and haven't been able to come up with something better yet. Anybody have any ideas?

Update:

Using @Dirk's suggestions (thanks!), this seems to be a simpler, efficient solution:

RcppExport SEXP makeDataFrame(SEXP in) {
    Rcpp::DataFrame dfin(in);
    Rcpp::List myList(dfin.length());
    Rcpp::CharacterVector namevec;
    std::string namestem = "Column Heading ";
    for (int i=0;i<dfin.length();i++) {
        myList[i] = dfin(i); // adding vectors
        namevec.push_back(namestem+std::string(1,(char)(((int)'a') + i))); // making up column names
    }
    myList.attr("names") = namevec;
    Rcpp::DataFrame dfout(myList);
    return dfout;
}

I concur with joran. The output of a C function called from within R is a list of all its arguments, both "in" and "out", so each "column" of the dataframe could be represented in the C function call as an argument. Once the result of the C function call is in R, all that remains to be done is to extract those list elements using list indexing and give them the appropriate names.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top