Question

I'm trying to find the easiest way to filter a data set with minimal syntax. This example is will contain minimal data, but I'm trying to find a way to generalize it for much larger data sets.

Here's my sample data set:

samp <- structure(list(group = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 
4L), .Label = c("a", "b", "c", "d"), class = "factor"), name = structure(c(5L, 
3L, 7L, 2L, 6L, 8L, 4L, 1L), .Label = c("hollis", "jo", "joe", 
"mike", "pat", "scott", "steph", "tim"), class = "factor")), .Names = c("group", 
"name"), class = "data.frame", row.names = c(NA, -8L))

Hypothetically, I want to filter down to where group == 'a' | group == 'b'.

I've tried match, but it only returns the first matches.

filt <- c('a', 'b')
samp[match(filt, s$group), ]
  group name
1     a  pat
2     b  joe

I've tried filter, but the syntax can get lengthy with lots of filtering arguments.

library(dplyr)
filter(samp, group == 'a' | group == 'b')

  group  name
1     a   pat
2     b   joe
3     a scott
4     b   tim

Ideally, I'd like to find a solution like:

library(dplyr)
filt <- c('a', 'b')
filter(samp, group == any(filt))

  group  name
1     a   pat
2     b   joe
3     a scott
4     b   tim

Unfortunately, this returns the following error.

[1] group name 
<0 rows> (or 0-length row.names)
Warning message:
In any(c("a", "b")) : coercing argument of type 'character' to logical

Thanks in advance for your help and suggestions!

Was it helpful?

Solution

Try %in%:

samp[samp$group %in% c("a", "b"), ]
#   group  name
# 1     a   pat
# 2     b   joe
# 5     a scott
# 6     b   tim

The dplry approach you're looking for is probably this:

library(dplyr)
filter(samp, group %in% c("a", "b"))
#   group  name
# 1     a   pat
# 2     b   joe
# 3     a scott
# 4     b   tim

This is similar to base R's subset(samp, subset=group %in% c("a", "b")), but note the warning at ?subset before considering using it non-interactively.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top