Question

I want to double check something. I'm trying to merge a large dataset into a smaller dataset. My large dataframe has observations that are not included in the small dataset. I cannot obtain a simple merge with my dataframes using the standard code

x<-merge(df1,df2) ###default is all=FALSE. 
                  ###output from this code produces a df with 49 rows instead of 13

have also used;

x<-merge(df1,df2, by='noms')  ##output produces 49 rows instead of 13

After much reading and checking for people who have alreday asked this question e.g., Merge 2 data frames, discard unmatched rows i came across this https://stat.ethz.ch/pipermail/r-help/2006-September/113148.html which says that there is no unambiguous way to fix this problem.

Is this still the case? apologies if this has been answered somewhere already, I've tried reading the core documents ?merge and posts on stackoverflow - but am now at a loose end.

my dfs are below

Small dataframe

 noms fruits apple orange kiwi all_comb comb numbers
1  mary  apple     1      0    0        1    1       1
2  mary  grape     0      0    0        0    1       2
3  mary orange     0      1    0        0    1       3
4  mary  apple     1      0    0        1    1       4
5  john banana     0      0    0        0    1       1
6  john  apple     1      0    0        1    1       2
7  john  apple     1      0    0        1    1       3
8  john  apple     1      0    0        1    1       4
9  lucy   kiwi     0      0    1        0    1       1
10 lucy orange     0      1    0        0    1       2
11 lucy  apple     1      0    0        1    1       3
12 lucy  berry     0      0    0        0    1       4
13  tom orange     0      1    0        0    1       1

Large dataframe

  noms age
1  jane  50
2  jane  50
3  jane  50
4  jane  50
5  mary  65
6  mary  65
7  mary  65
8  mary  65
9  john  34
10 john  34
11 john  34
12 john  34
13  pat  65
14  pat  65
15  pat  65
16 lucy  89
17 lucy  89
18 lucy  89
19 lucy  89
20  tom  12

Desired output

df

  noms fruits apple orange kiwi all_comb comb numbers age
1  mary  apple     1      0    0        1    1       1  65
2  mary  grape     0      0    0        0    1       2  65
3  mary orange     0      1    0        0    1       3  65
4  mary  apple     1      0    0        1    1       4  65
5  john banana     0      0    0        0    1       1  34
6  john  apple     1      0    0        1    1       2  34
7  john  apple     1      0    0        1    1       3  34
8  john  apple     1      0    0        1    1       4  34
9  lucy   kiwi     0      0    1        0    1       1  89
10 lucy orange     0      1    0        0    1       2  89
11 lucy  apple     1      0    0        1    1       3  89
12 lucy  berry     0      0    0        0    1       4  89
13  tom orange     0      1    0        0    1       1  12
Was it helpful?

Solution

It works if you use the unique rows of df2:

merge(df1, unique(df2))

   noms fruits apple orange kiwi all_comb comb numbers age
1  john banana     0      0    0        0    1       1  34
2  john  apple     1      0    0        1    1       2  34
3  john  apple     1      0    0        1    1       3  34
4  john  apple     1      0    0        1    1       4  34
5  lucy   kiwi     0      0    1        0    1       1  89
6  lucy orange     0      1    0        0    1       2  89
7  lucy  apple     1      0    0        1    1       3  89
8  lucy  berry     0      0    0        0    1       4  89
9  mary  apple     1      0    0        1    1       1  65
10 mary  grape     0      0    0        0    1       2  65
11 mary orange     0      1    0        0    1       3  65
12 mary  apple     1      0    0        1    1       4  65
13  tom orange     0      1    0        0    1       1  12

OTHER TIPS

Is this what you want to do?

df_agg <- aggregate(age ~ noms, df_large, max)
merge(df_agg, df_small, by = "noms")

or if you don't care about the ages,

df_agg <- data.frame(nom = unique(df_large$noms))
merge(df_agg, df_small, by = "noms")
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top