Question

I would like to subset data based on a match in one column and no match from another using data.table, J() and !J() functions

library(data.table)
DT <- data.table(x = rep(c("a", "b", "c"), each=2000), y=c(rep(c(1,3,6), each = 1)) , key = c("x", "y"))

I am looking to have the J() and !J() functions provide the same result as the code below:

DT[J("b")][y !=1] 

I tried the following and it gave the following error:

DT[J("b")][!J(x, 1)]

Error in vecseq(f__, len__, if (allow.cartesian) NULL else as.integer(max(nrow(x),  : 
  Join results in 1920000 rows; more than 4800 = max(nrow(x),nrow(i)). Check for duplicate key values in i, each of which join to the same group in x over and over again. If that's ok, try including `j` and dropping `by` (by-without-by) so that j runs for each group to avoid the large allocation. If you are sure you wish to proceed, rerun with allow.cartesian=TRUE. Otherwise, please search for this error message in the FAQ, Wiki, Stack Overflow and datatable-help for advice.

I tried the code below but it did not eliminate the second condition which is not to include 1

DT[J("b")][!J("1")]
Was it helpful?

Solution

This answer came from Arun. All the credit goes to Arun

library(data.table)
DT <- data.table(x = rep(c("a", "b", "c"), each=2000), y=c(rep(c(1,3,6), each = 1)) , key = c("x", "y"))

DT["b"][!J(unique(x), 1)]

This subsets the data based on a match for all rows containing b in column x and no match to 1 in all the rows of column y.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top