Question

I am trying to see whether the elements in vector match with the first four digits of elements within a second vector (they are nested identifiers) and I'm not quite sure how to run the match. For example:

X     Y 
1111  111120
1111  890933
2222  780777
2222  222247

I would like to create code to tell me whether the first four digits of element i in vector y match the digits in element i in vector x. Extending the example, I hope to see:

True
False
False
True

Thanks for any thoughts.

Was it helpful?

Solution

Using apply to loop over the rows and use grepl will work...

apply( df , 1 , function(x) grepl( x[1] , x[2] ) )
#[1]  TRUE FALSE FALSE  TRUE

OTHER TIPS

Suppose your data.frame is df and using substr will do the trick.

> df$X==as.numeric(substr(df$Y, start=1, stop=4))
[1]  TRUE FALSE FALSE  TRUE

Putting all together in a new data.frame:

> transform(df, Z=df$X==as.numeric(substr(df$Y, start=1, stop=4)))
     X      Y     Z
1 1111 111120  TRUE
2 1111 890933 FALSE
3 2222 780777 FALSE
4 2222 222247  TRUE

Take a look at ?substr for further details on how it works.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top