Building on Kevin's answer, here is a C++11 version using a slightly different approach:
List rowCounts_2(IntegerMatrix x) {
int n = x.nrow() ;
int nc = x.ncol() ;
std::vector<int> hashes(n) ;
for( int k=0, pow=1; k<nc; k++, pow*=2){
IntegerMatrix::Column column = x.column(k) ;
std::transform( column.begin(), column.end(), hashes.begin(), hashes.begin(), [=]( int v, int h ){
return h + pow*v ;
}) ;
}
using Pair = std::pair<int,int> ;
std::unordered_map<int, Pair> map_counts ;
for( int i=0; i<n; i++){
Pair& p = map_counts[ hashes[i] ] ;
if( p.first == 0){
p.first = i+1 ; // using directly 1-based index
}
p.second++ ;
}
int nres = map_counts.size() ;
IntegerVector idx(nres), counts(nres) ;
auto it=map_counts.begin() ;
for( int i=0; i<nres; i++, ++it){
idx[i] = it->second.first ;
counts[i] = it->second.second ;
}
return List::create( _["counts"] = counts, _["idx"] = idx );
}
The idea is to trade memory for speed. The first change is that I'm allocating and filling a std::vector<int>
to host the hashes. Doing this allows me to traverse the input matrix column by column which is more efficient.
Once this is done, I'm training a hash map of pairs (index, counts) std::unordered_map<int, std::pair<int,int>>
. The key of the map is the hash, the value is a pair (index, count).
Then I just have to traverse the hash map and collect the results. The results don't appear in ascending order of idx
(it is easy to do it if we really want that).
I get these results for n=1e5
and n=1e7
.
> m <- matrix(sample(0:1, 1e+05, TRUE), ncol = 10)
> microbenchmark(rowCounts(m), rowCountsR(m), rowCounts_2(m))
Unit: microseconds
expr min lq median uq max neval
rowCounts(m) 1194.536 1201.273 1213.1450 1231.7295 1286.458 100
rowCountsR(m) 575.004 933.637 962.8720 981.6015 23678.451 100
rowCounts_2(m) 421.744 429.118 442.5095 455.2510 530.261 100
> m <- matrix(sample(0:1, 1e+07, TRUE), ncol = 10)
> microbenchmark(rowCounts(m), rowCountsR(m), rowCounts_2(m))
Unit: milliseconds
expr min lq median uq max neval
rowCounts(m) 97.22727 98.02716 98.56641 100.42262 102.07661 100
rowCountsR(m) 57.44635 59.46188 69.34481 73.89541 100.43032 100
rowCounts_2(m) 22.95741 23.38186 23.78068 24.16814 27.44125 100
Taking advantage of threading helps further. Below is how the time is split between 4 threads on my machine. See the code in this gist.
Here are benchmarks with the last version too:
> microbenchmark(rowCountsR(m), rowCounts_1(m), rowCounts_2(m), rowCounts_3(m,4))
Unit: milliseconds
expr min lq median uq max neval
rowCountsR(m) 93.67895 127.58762 127.81847 128.03472 151.54455 100
rowCounts_1(m) 120.47675 120.89169 121.31227 122.86422 137.86543 100
rowCounts_2(m) 28.88102 29.68101 29.83790 29.97112 38.14453 100
rowCounts_3(m, 4) 12.50059 12.68981 12.87712 13.10425 17.21966 100