If I understand correctly, you're hoping that strata( uniqueStrata(i) )
will subset the vector, similar to how R's subsetting operates. This is unfortunately not the case; you would have to perform the subsetting 'by hand'. Rcpp
doesn't have 'generic' subsetting operates available yet.
When it comes to using Rcpp
, you really want to leverage the C++ standard library where possible. The de-facto C++ way of generating these counts would be to use a std::map
(or std::unordered_map
, if you can assume C++11), with something like the following. I include a benchmark for interest.
Note from Dirk: unordered_map
is actually available from tr1
for pre-C++11, so one can include it using e.g. #include <tr1/unordered_map>
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
IntegerVector LengthStrata( DataFrame dataset ) {
Rcpp::CharacterVector strata = dataset["strata2013"];
int n = strata.size();
std::map<SEXP, int> counts;
for (int i = 0; i < n; ++i) {
++counts[ strata[i] ];
}
return wrap(counts);
}
/*** R
library(data.table)
library(microbenchmark)
set.seed(45)
DF <- data.frame(strata2013=sample(letters, 1E5, TRUE))
DT <- data.table(DF)
LengthStrata(DF)
DT[, .N, by=strata2013]
microbenchmark(
LengthStrata(DF),
DT[, .N, by=strata2013]
)
*/
gives me
Unit: milliseconds
expr min lq median uq max neval
LengthStrata(DF) 3.267131 3.831563 3.934992 4.101050 11.491939 100
DT[, .N, by = strata2013] 1.980896 2.360590 2.480884 2.687771 3.052583 100
The Rcpp
solution is slower in this case likely due to the time it takes to move R objects to and from the C++ containers, but hopefully this is instructive.
Aside: This is, in fact, already included in Rcpp
as the sugar table
function, so if you want to skip the learning experience, you can use a pre-baked solution as
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
IntegerVector LengthStrata( DataFrame dataset ) {
Rcpp::CharacterVector strata = dataset["strata2013"];
return table(strata);
}
Sugar improves the speed of the Rcpp function:
Unit: milliseconds
expr min lq median uq max neval
LengthStrata(DF) 5.548094 5.870184 6.014002 6.448235 6.922062 100
DT[, .N, by = strate2013] 6.526993 7.136290 7.462661 7.949543 81.233216 100