Generalize R %in% operator to match tuples

Question 1

I'm wondering if you have made this a bit more complicated than it is. For example,

set.seed(1618)
vec <- c(1,3)
mat <- matrix(rpois(1000,3), ncol = 2)
rownames(mat) <- 1:nrow(mat)


mat[sapply(1:nrow(mat), function(x) all(vec %in% mat[x, ])), ]

# gives me this
#     [,1] [,2]
# 6      3    1
# 38     3    1
# 39     3    1
# 85     1    3
# 88     1    3
# 89     1    3
# 95     3    1
# 113    1    3
# ...

you could subset this further if you care about the order or you could modify the function slightly:

mat[sapply(1:nrow(mat), function(x) 
  all(paste(vec, collapse = '') %in% paste(mat[x, ], collapse = ''))), ]

#      [,1] [,2]
# 85     1    3
# 88     1    3
# 89     1    3
# 113    1    3
# 133    1    3
# 139    1    3
# 187    1    3
# ...

another example with a longer vector

set.seed(1618)
vec <- c(1,4,5,2)
mat <- matrix(rpois(10000, 3), ncol = 4)
rownames(mat) <- 1:nrow(mat)

mat[sapply(1:nrow(mat), function(x) all(vec %in% mat[x, ])), ]

#      [,1] [,2] [,3] [,4]
# 57      2    5    1    4
# 147     1    5    2    4
# 279     1    2    5    4
# 303     1    5    2    4
# 437     1    5    4    2
# 443     1    4    5    2
# 580     5    4    2    1
# ...

I see a couple that match:

mat[sapply(1:nrow(mat), function(x) 
  all(paste(vec, collapse = '') %in% paste(mat[x, ], collapse = ''))), ]

#      [,1] [,2] [,3] [,4]
# 443     1    4    5    2
# 901     1    4    5    2
# 1047    1    4    5    2

but only three

for your single row case:

vec <- c(1,4,5,2)
mat <- matrix(c(1,4,5,2), ncol = 4)
rownames(mat) <- 1:nrow(mat)

mat[sapply(1:nrow(mat), function(x) 
  all(paste(vec, collapse = '') %in% paste(mat[x, ], collapse = ''))), ]

# [1] 1 4 5 2

here is a simple function with the above code

is.tuplein <- function(vec, mat, exact = TRUE) {  
  rownames(mat) <- 1:nrow(mat)
  if (exact) 
    tmp <- mat[sapply(1:nrow(mat), function(x) 
      all(paste(vec, collapse = '') %in% paste(mat[x, ], collapse = ''))), ]
  else tmp <- mat[sapply(1:nrow(mat), function(x) all(vec %in% mat[x, ])), ]
  return(tmp)
}

is.tuplein(vec = vec, mat = mat)
# [1] 1 4 5 2

seems to work, so let's make our own %in% operator:

`%tuple%` <- function(x, y) is.tuplein(vec = x, mat = y, exact = TRUE)
`%tuple1%` <- function(x, y) is.tuplein(vec = x, mat = y, exact = FALSE)

and try her out

set.seed(1618)
c(1,2,3) %tuple% matrix(rpois(1002,3), ncol = 3)

#     [,1] [,2] [,3]
# 133    1    2    3
# 190    1    2    3
# 321    1    2    3

set.seed(1618)
c(1,2,3) %tuple1% matrix(rpois(1002,3), ncol = 3)

#     [,1] [,2] [,3]
# 48     2    3    1
# 64     2    3    1
# 71     1    3    2
# 73     3    1    2
# 108    3    1    2
# 112    1    3    2
# 133    1    2    3
# 166    2    1    3

Question 2

Does this do what you want (even for more than 2 columns)?

paste(row.vec,collapse="_") %in% apply(data.set,1,paste,collapse="_")