Hi everybody I am working with a list of data frames in R and I want to merge them one by one. I found one solution is using Reduce()
function with merge()
but I don't get the same result when I merge one by one data frame. My list of data frames is global
and It has the next structure (I include dput()
version of my list in final part):
global
$a1
ID Value Products z1
1 001 1 3 1
2 002 2 2 1
3 003 3 0 1
4 004 4 1 1
5 005 5 1 1
6 006 6 6 1
7 007 7 7 1
8 009 8 1 1
9 010 9 1 1
$a2
ID Value Products z2
1 001 1 3 2
2 002 2 2 2
3 003 3 0 2
4 004 4 1 2
5 005 5 1 2
6 006 6 6 2
7 011 10 5 2
8 012 11 5 2
9 007 7 7 2
10 009 8 1 2
11 010 9 1 2
$a3
ID Value Products z3
1 001 1 3 3
2 002 2 2 3
3 012 11 5 3
4 013 11 1 3
5 014 11 2 3
6 003 3 0 3
7 004 4 1 3
8 005 5 1 3
9 006 6 6 3
10 007 7 7 3
11 009 8 1 3
12 010 9 1 3
13 011 10 5 3
$a4
ID Value Products z4
1 001 1 3 4
2 002 2 2 4
3 012 11 5 4
4 013 11 1 4
5 014 11 2 4
6 003 3 0 4
7 004 4 1 4
8 005 5 1 4
9 006 6 6 4
10 007 7 7 4
11 009 8 1 4
12 010 9 1 4
13 011 10 5 4
14 015 12 3 4
15 016 12 3 4
$a5
ID Value Products z5
1 001 1 3 5
2 002 2 2 5
3 003 3 0 5
4 004 4 1 5
5 016 12 3 5
6 017 14 2 5
7 005 5 1 5
8 006 6 6 5
9 007 7 7 5
10 009 8 1 5
11 010 9 1 5
12 011 10 5 5
13 012 11 5 5
14 013 11 1 5
15 014 11 2 5
16 015 12 3 5
17 018 14 2 5
I am merging all data frames with their previous data frames in global
and for this I used the next code to create a new list named listag
:
listag=Reduce(function(x, y) merge(x,y[,c(1,4)],by=intersect(names(x)[1],names(y)[1]),all.x=TRUE),global,accumulate=TRUE)
I used the argument all.x=TRUE
in merge()
because I want to keep in each data frame their orginal number of rows (a1
=9,a2
=11,a3
=13,a4
=15,a5
=17). After of this I separated global
in individual data frames to check last code works fine and I found differences. To separate data frames I used this code:
list2env(global, envir=.GlobalEnv)
I got my five data frames. Now I am going to show what I want with data frames a4
and a5
. First I used next code to merge a4
with a1
,a2
,a3
and a4
:
Final41=merge(a4,a1[,c(1,4)],by=intersect(names(a4)[1],names(a1)[1]),all.x=TRUE)
Final42=merge(Final41,a2[,c(1,4)],by=intersect(names(Final41)[1],names(a2)[1]),all.x=TRUE)
Final43=merge(Final42,a3[,c(1,4)],by=intersect(names(Final42)[1],names(a3)[1]),all.x=TRUE)
Final4=merge(Final43,a4[,c(1,4)],by=intersect(names(Final43)[1],names(a4)[1]),all.x=TRUE)
The result of this code is:
Final4
ID Value Products z4.x z1 z2 z3 z4.y
1 001 1 3 4 1 2 3 4
2 002 2 2 4 1 2 3 4
3 003 3 0 4 1 2 3 4
4 004 4 1 4 1 2 3 4
5 005 5 1 4 1 2 3 4
6 006 6 6 4 1 2 3 4
7 007 7 7 4 1 2 3 4
8 009 8 1 4 1 2 3 4
9 010 9 1 4 1 2 3 4
10 011 10 5 4 NA 2 3 4
11 012 11 5 4 NA 2 3 4
12 013 11 1 4 NA NA 3 4
13 014 11 2 4 NA NA 3 4
14 015 12 3 4 NA NA NA 4
15 016 12 3 4 NA NA NA 4
Where the argument all.x=TRUE
is working fine because I keep the original number of observations in a4
(15). When I extract the 4th element of listag
I got this:
f4l=listag[[4]]
f4l
ID Value Products z1 z2 z3 z4
1 001 1 3 1 2 3 4
2 002 2 2 1 2 3 4
3 003 3 0 1 2 3 4
4 004 4 1 1 2 3 4
5 005 5 1 1 2 3 4
6 006 6 6 1 2 3 4
7 007 7 7 1 2 3 4
8 009 8 1 1 2 3 4
9 010 9 1 1 2 3 4
For merge()
in Reduce()
function I am considering also all.x=TRUE
but I don't get the same result and the number of observations is wrong. I would like to get after applying the combination of Reduce()
and merge()
the result of Final4
. It is the same for the rest of data frames of listag
after applying Reduce()
and merge()
combined over global
. I would like to get this result for each data frame in listag
(in this case for 4th data frame it would be):
ID Value Products z1 z2 z3 z4
1 001 1 3 1 2 3 4
2 002 2 2 1 2 3 4
3 003 3 0 1 2 3 4
4 004 4 1 1 2 3 4
5 005 5 1 1 2 3 4
6 006 6 6 1 2 3 4
7 007 7 7 1 2 3 4
8 009 8 1 1 2 3 4
9 010 9 1 1 2 3 4
10 011 10 5 NA 2 3 4
11 012 11 5 NA 2 3 4
12 013 11 1 NA NA 3 4
13 014 11 2 NA NA 3 4
14 015 12 3 NA NA NA 4
15 016 12 3 NA NA NA 4
I don't know what is wrong in my code when I combine Reduce()
and merge()
. I am considering all.x=TRUE
equal when I make the merge one by one data frame. Could you help me with this. Maybe I have to add another argument in the combination of Reduce()
and merge()
to get my result or there is other way like use lapply
or llply
from plyr
package over global
. The dput()
version of global is the next:
structure(list(a1 = structure(list(ID = c("001", "002", "003",
"004", "005", "006", "007", "009", "010"), Value = c(1, 2, 3,
4, 5, 6, 7, 8, 9), Products = c(3, 2, 0, 1, 1, 6, 7, 1, 1), z1 = c(1,
1, 1, 1, 1, 1, 1, 1, 1)), .Names = c("ID", "Value", "Products",
"z1"), row.names = c(NA, 9L), class = "data.frame"), a2 = structure(list(
ID = c("001", "002", "003", "004", "005", "006", "011", "012",
"007", "009", "010"), Value = c(1, 2, 3, 4, 5, 6, 10, 11,
7, 8, 9), Products = c(3, 2, 0, 1, 1, 6, 5, 5, 7, 1, 1),
z2 = c(2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2)), .Names = c("ID",
"Value", "Products", "z2"), row.names = c(NA, 11L), class = "data.frame"),
a3 = structure(list(ID = c("001", "002", "012", "013", "014",
"003", "004", "005", "006", "007", "009", "010", "011"),
Value = c(1, 2, 11, 11, 11, 3, 4, 5, 6, 7, 8, 9, 10),
Products = c(3, 2, 5, 1, 2, 0, 1, 1, 6, 7, 1, 1, 5),
z3 = c(3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3)), .Names = c("ID",
"Value", "Products", "z3"), row.names = c(NA, 13L), class = "data.frame"),
a4 = structure(list(ID = c("001", "002", "012", "013", "014",
"003", "004", "005", "006", "007", "009", "010", "011", "015",
"016"), Value = c(1, 2, 11, 11, 11, 3, 4, 5, 6, 7, 8, 9,
10, 12, 12), Products = c(3, 2, 5, 1, 2, 0, 1, 1, 6, 7, 1,
1, 5, 3, 3), z4 = c(4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
4, 4)), .Names = c("ID", "Value", "Products", "z4"), row.names = c(NA,
15L), class = "data.frame"), a5 = structure(list(ID = c("001",
"002", "003", "004", "016", "017", "005", "006", "007", "009",
"010", "011", "012", "013", "014", "015", "018"), Value = c(1,
2, 3, 4, 12, 14, 5, 6, 7, 8, 9, 10, 11, 11, 11, 12, 14),
Products = c(3, 2, 0, 1, 3, 2, 1, 6, 7, 1, 1, 5, 5, 1,
2, 3, 2), z5 = c(5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5)), .Names = c("ID", "Value", "Products",
"z5"), row.names = c(NA, 17L), class = "data.frame")), .Names = c("a1",
"a2", "a3", "a4", "a5"))
Many thanks for your help.