Question

I have two dataframes. In the first one, I have a KEY/ID column and two variables:

KEY V1 V2
1   10  2
2   20  4
3   30  6   
4   40  8
5   50 10

In the second dataframe, I have a KEY/ID column and a third variable

KEY V3 
1    5  
2   10  
3   20  

I would like to extract the rows of the first dataframe that are also in the second dataframe by matching them according to the KEY column. I would also like to add the V3 column to final dataset.

KEY V1 V2 V3 
1   10  2  5
2   20  4 10 
3   30  6 20   

This are my attempts by using the subset and the merge function

subset(data1, data1$KEY == data2$KEY) 
merge(data1, data2, by.x = "KEY", by.y = "KEY")

None of them does the task.

Any hint would be appreaciated. Thank you!

Was it helpful?

Solution

merge(data1, data2, by="KEY") should do it!

OTHER TIPS

If what you want is an inner join, then your attempt should do it. If it doesn't check the formats of Key columns in both the table using class(data1$key).

Apart from these and the merge suggested by Christian, you can use -

library(plyr)
join(data1, data2, by="KEY", type="inner")

or

library(data.table)
setkey(data1, KEY)
setkey(data2, KEY)
data1[,list(data1,data2)]

You could use a dplyr *_join. Given the sample data, both of the following would give the same result:

library(dplyr)
df_merged <- inner_join(data1, data2, by = 'KEY')
df_merged <- right_join(data1, data2, by = 'KEY')

A inner_join returns all rows from df1 where there are matching values in df2, and all columns from df1 and df2.

A right_join returns all rows from df2, and all columns from df1 and df2.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top