Question

Here,

a<-c("Look at the sky", "Sun is bright", "cloudy day")
b<-c("sky", "day")

I want to subset a based on b. My preferred answer is:

"Look at the sky", "cloudy day"

How to do this in R?

Was it helpful?

Solution

Option 1

You can match a against all terms in b with sapply

sapply(b, grepl, a)

       sky   day
[1,]  TRUE FALSE
[2,] FALSE FALSE
[3,] FALSE  TRUE

Then you collapse all rows with apply and subset a.

a[apply(sapply(b, grepl, a), 1, any)]

[1] "Look at the sky" "cloudy day"     

Option 2

Create a combined regexp pattern

paste(b, collapse="|")

[1] "sky|day"

and grep with it

a[grepl(paste(b, collapse="|"), a)]

[1] "Look at the sky" "cloudy day"     

OTHER TIPS

Try the string searching facilities form the stringi package:

library(stringi)
a[sapply(a, function(ae) any(stri_detect_fixed(ae, b)))]
## [1] "Look at the sky" "cloudy day"

Here we detect whether each string in a contains any string in b as its subsequence.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top