Here is a generic solution for this type of problem which is very efficient:
data.1.ID <- paste(data.1[,1],data.1[,2],data.1[,3])
keep.these.ID <- paste(keep.these[,1],keep.these[,2],keep.these[,3])
desired.result <- data.1[data.1.ID %in% keep.these.ID,]
I have simply created an unique ID for each record, and then searched it. Note: This will change the row names, and you may want to add the following:
row.names(desired.result) <- 1:nrow(desired.result)
EDIT:
Here is another way to solve the same problem.
If you have a very large data set, say millions of rows, another very efficient solution is using the package data.table
. It works nearly 50-100 times faster than merge
, depending on how much data you have.
All you have to do is the following:
library(data.table)
Step1: Convert data.frame
to data.table
, with first three columns as keys.
d1 <- data.table(data.1, key=names(data.1)[1:3])
kt <- data.table(keep.these, key=names(keep.these)[1:3])
Step2: A merge using data.table
's binary search:
d1[kt]
Note1: The simplicity of execution. Note2: This will sort the data by key. To avoid that try following:
data.1$index <- 1:nrow(data.1) # Add index to original data
d1 <- data.table(data.1,key=names(data.1)[1:3]) # Step1 as above
kt <- data.table(keep.these,key=names(keep.these)[1:3]) # Step1 as above
d1[kt][order(index)] # Step2 as above
If you want to remove the last two columns (index
, BB
), that's straight forward too:
d1[kt][order(index)][,-(5:6),with=F] #Remove index
Try this with large data sets, and compare the timing with merge
. It's typically about 50-100 times faster.
To learn more about data.table
, try:
vignette("datatable-intro")
vignette("datatable-faq")
vignette("datatable-timings")
Or see it in action:
example(data.table)
Hope this helps!!