Subset a data frame based on value pairs stored in independent ordered vectors

https://stackoverflow.com/questions/22492884

r
subset

17-06-2023
|

Question

I have an R dataframe that I need to subset data from. The subsetting will be based on two columns in the dataframe. For example:

A <- c(1,2,3,3,5,1)
B <- c(6,7,8,9,8,8)
Value <- c(9,5,2,1,2,2)
DATA <- data.frame(A,B,Value)

This is how DATA looks

I want those rows of data for which (A,B) combination is (1,6) and (3,8). These pairs are stored as individual (ordered) vectors of A and B:

AList <- c(1,3)
BList <- c(6,8)

Now, I am trying to subset the data basically by comparing if A column is present in AList AND B column is present in BList

DATA[(DATA$A %in% AList & DATA$B %in% BList),]

The subsetted result is shown below. In addition to the value pairs (1,6) and (3,8) I am also getting (1,8). Basically, this filter has given me value pairs for all combinations in AList and BList. How do I restrict it to just (1,6) and (3,8)?

This is my desired result:

A B Value
1 6     9
3 8     2

Solution 2

You could try match which an appropriated nomatch argument:

sub <- match(DATA$A, AList, nomatch=-1) == match(DATA$B, BList, nomatch=-2)
sub
# [1]  TRUE FALSE  TRUE FALSE FALSE FALSE

DATA[sub,]
#  A B Value
#1 1 6     9
#3 3 8     2

A paste based approach would also be possible:

sub <- paste(DATA$A, DATA$B, sep=":") %in% paste(AList, BList, sep=":")
sub
# [1]  TRUE FALSE  TRUE FALSE FALSE FALSE

DATA[sub,]
#  A B Value
#1 1 6     9
#3 3 8     2

OTHER TIPS

This is a job for merge:

KEYS <- data.frame(A = AList, B = BList)
merge(DATA, KEYS)

#   A B Value
# 1 1 6     9
# 2 3 8     2

Edit: after the OP expressed his preference for a logical vector in the comments below, I would suggest one of the following.

Use merge:

df.in.df <- function(x, y) {
  common.names <- intersect(names(x), names(y))
  idx <- seq_len(nrow(x))
  x <- x[common.names]
  y <- y[common.names]
  x <- transform(x, .row.idx = idx)
  idx %in% merge(x, y)$.row.idx
}

or interaction:

df.in.df <- function(x, y) {
  common.names <- intersect(names(x), names(y))
  interaction(x[common.names]) %in% interaction(y[common.names])
}

In both cases:

df.in.df(DATA, KEYS)
# [1] TRUE FALSE  TRUE FALSE FALSE FALSE

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow