Question

I have a dataset like the below -

> a_i
 [1] "Our-Facebook-Page/td-p/3175990"                           
 [2] "Our-Facebook-Page/td-p/3175990/page/2"  
   ....    
[17] "Data-duplicate-files/td-p/4743405"                  
[18] "Data-duplicate-files/td-p/4743405/page/2"            
[19] "Subscription-Release-1-sucks/td-p/4556739"       
[20] "Subscription-Release-1-sucks/td-p/4556739/page/2"

 > b_i
[1] "Data-duplicate-files/td-p/4743405"                  
[2] "Subscription-Release-1-sucks/td-p/4556739"
[3] "Quick-fix/td-p/4556740"

My goal is to find the 7 digit numbers that only exists in b_i (e.g. 4743405, 4556739, 4556740) and grab the data from a_i that contains corresponding numbers. So the final output will be something like this -

[1] "Data-duplicate-files/td-p/4743405"                  
[2] "Data-duplicate-files/td-p/4743405/page/2"            
[3] "Subscription-Release-1-sucks/td-p/4556739"       
[4] "Subscription-Release-1-sucks/td-p/4556739/page/2""

I am able to get numbers using strsplit(b_i, "/") but I'm stuck in grabbing lists that contains matching numbers. Would there be any elegant way to map those numbers and grab the lists?

Was it helpful?

Solution 2

Your data are not in a reproducible format, so I haven't tried this, but it takes a slightly different approach than IShouldBUyABoat by just enforcing a 7-digit rule to identify numbers:

sapply(regmatches(b_i,regexpr("[[:digit:]]{7}", b_i)),
       function(x) a_i[grepl(x, a_i)])

OTHER TIPS

a_i[grep( paste( gsub("(^.+/)([[:digit:]])(/.+$)", "\\2", b_i), 
                 collapse="|"), 
     a_i)]
[1] "Data-duplicate-files/td-p/4743405"               
[2] "Data-duplicate-files/td-p/4743405/page/2"        
[3] "Subscription-Release-1-sucks/td-p/4556739"       
[4] "Subscription-Release-1-sucks/td-p/4556739/page/2"

This constructs a bunch of digit strings separated by a pipe-sign to form a greppish-OR pattern. If you wanted to enforce the 7 digit rule you could put a {} repetition quantifier. At the moment it would accept any number of digits between forward-slashes.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top