Question

Wondering if there's a faster way to achieve the following:

I have a vector of max. length long1 which is all zeros. How do I generate a matrix of all possible combinations of ones in certain positions up to and including the maximum max1s.

The following works, but seems to be rather inefficient when the return matrix is large e.g. >1e5.

### longest series of zeros
long1 <- 4
### max no. of 1s
max1s <- 2
### get combinations up to long1
f1 <- function(i) t(combinat::combn(seq.int(long1),
                                    m=i, simplify=TRUE))
### list of positions in target matrix to be made 1s
### each list element represents column positions of 1s
l1 <- sapply(1:max1s, f1)
### no. rows in return matrix
nrow1 <- sum(unlist(lapply(l1, nrow)))
### set up matrix of zeros
c2 <- matrix(0L, nrow=nrow1, ncol=long1)
### rows to start at for each 'i' in length(l1) below
nrow2 <- c(1, 1+cumsum(unlist(lapply(l1, nrow))))
for (i in 1:length(l1)){
    for (j in 1:nrow(l1[[i]])){
### now iterate over each row in that element of l1
### set relevant position in matrix to 1
        c2[nrow2[i]+(j-1), l1[[i]][j, ] ] <-  1L
    }}

In this case it's all combinations of 1s, up to a max of 2, in a vector of length 4:

> c2
      [,1] [,2] [,3] [,4]
 [1,]    1    0    0    0
 [2,]    0    1    0    0
 [3,]    0    0    1    0
 [4,]    0    0    0    1
 [5,]    1    1    0    0
 [6,]    1    0    1    0
 [7,]    1    0    0    1
 [8,]    0    1    1    0
 [9,]    0    1    0    1
[10,]    0    0    1    1

I would prefer to avoid using combinat::hcube then eliminating rows with more than a certain no. of 1s as this approach will create needlessly large matrices for an application like this.

Was it helpful?

Solution

I guess you could just separately compute the combinations of each size using combn and then use do.call with rbind to combine them all together:

allcombo <- function(long1, max1s) {
  do.call(rbind, lapply(1:max1s, function(num1) {
    t(apply(combn(long1, num1), 2, function(x) {
      col = rep(0, long1)
      col[x] = 1
      col
    }))
  }))
}

I've stored your posted solution in function OP. We can check they return the same values:

all.equal(OP(20, 5), allcombo(20, 5))
# [1] TRUE

Now we can benchmark (there are 21699 returned rows):

library(microbenchmark)
microbenchmark(OP(20, 5), allcombo(20, 5))
# Unit: milliseconds
#             expr      min       lq   median       uq      max neval
#        OP(20, 5) 242.4120 256.5791 269.7237 292.7131 556.5984   100
#  allcombo(20, 5) 150.4291 179.2588 188.4840 200.9898 448.2214   100

So this approach using combn is a bit faster (30% on my computer for this set of parameters).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top