Pregunta

Here is my list that you can run in your console (please, tell me if it's too long for example purposes, I can amend it):

my_list = list(structure(list(PX_LAST = c(0.398, 0.457, 0.4, 0.159, 0.126, 
0.108, 0.26, 0.239, 0.222, 0.191, 0.184)), .Names = "PX_LAST", row.names = c("2014-04-28 00:00:00", 
"2014-04-29 00:00:00", "2014-04-30 00:00:00", "2014-05-02 00:00:00", 
"2014-05-05 00:00:00", "2014-05-06 00:00:00", "2014-05-07 00:00:00", 
"2014-05-08 00:00:00", "2014-05-09 00:00:00", "2014-05-12 00:00:00", 
"2014-05-13 00:00:00"), class = "data.frame"), structure(list(
    PX_LAST = c(1.731, 1.706, 1.7095, 1.69, 1.713, 1.711, 1.724, 
    1.699, 1.702, 1.705, 1.649, 1.611)), .Names = "PX_LAST", row.names = c("2014-04-29 00:00:00", 
"2014-04-30 00:00:00", "2014-05-01 00:00:00", "2014-05-02 00:00:00", 
"2014-05-05 00:00:00", "2014-05-06 00:00:00", "2014-05-07 00:00:00", 
"2014-05-08 00:00:00", "2014-05-09 00:00:00", "2014-05-12 00:00:00", 
"2014-05-13 00:00:00", "2014-05-14 00:00:00"), class = "data.frame"), 
    structure(list(PX_LAST = c(0.481, 0.456, 0.448, 0.439, 0.436, 
    0.448, 0.458, 0.466, 0.432, 0.437, 0.441, 0.417, 0.4035)), .Names = "PX_LAST", row.names = c("2014-04-28 00:00:00", 
    "2014-04-29 00:00:00", "2014-04-30 00:00:00", "2014-05-01 00:00:00", 
    "2014-05-02 00:00:00", "2014-05-05 00:00:00", "2014-05-06 00:00:00", 
    "2014-05-07 00:00:00", "2014-05-08 00:00:00", "2014-05-09 00:00:00", 
    "2014-05-12 00:00:00", "2014-05-13 00:00:00", "2014-05-14 00:00:00"
    ), class = "data.frame"), structure(list(PX_LAST = c(1.65, 
    1.65, 1.64, 1.65, 1.662, 1.6595, 1.665, 1.6595, 1.6625, 1.652, 
    1.645, 1.6245, 1.627, 1.633)), .Names = "PX_LAST", row.names = c("2014-04-25 00:00:00", 
    "2014-04-28 00:00:00", "2014-04-29 00:00:00", "2014-04-30 00:00:00", 
    "2014-05-01 00:00:00", "2014-05-02 00:00:00", "2014-05-05 00:00:00", 
    "2014-05-06 00:00:00", "2014-05-07 00:00:00", "2014-05-08 00:00:00", 
    "2014-05-09 00:00:00", "2014-05-12 00:00:00", "2014-05-13 00:00:00", 
    "2014-05-14 00:00:00"), class = "data.frame"))

My question is: how can I use do.call() on that list to merge all the data according to their date?

Consider either merge and cbind return errors that I am not able to manage:

> do.call(what = merge, args = my_list)
Error in fix.by(by.x, x) : 
'by' must specify column(s) as numbers, names or logical

> do.call(what = cbind, args = my_list)
Error in data.frame(..., check.names = FALSE) : 
arguments imply differing number of rows: 11, 12, 13, 14

I would like to get a single data matrix (whose possibly missing/not matching data are replaced by NAs) equal to the one I would get using merge() on the elements of my_list.

¿Fue útil?

Solución

This would be a bit easier if you were not merging by row names, But you could do this with the Reduce function which will sequentially apply a function along a list of values (in this case data.frames`. Try

Reduce(function(x,y) {
    dd<-merge(x,y,by=0); rownames(dd)<-dd$Row.names; dd[-1]
}, my_list)

This will merge all matching rows. You can add all=T to the match if you like as well or customize how you would if you were using a regular merge().

You will get a warning about column names because each of your columns has an identical name so when you merge into multiple columns, merge doesn't know what you name them. You could rename them with something like

my_new_list <- Map(
    function(x,n) {
        names(x)<-n; x
    }, 
    my_list, 
    paste("PX_LAST",1:length(my_list), sep="_")
)

then

 Reduce(function(x,y) {
    dd<-merge(x,y,by=0); rownames(dd)<-dd$Row.names; dd[-1]
}, my_new_list)

won't complain.

Otros consejos

Here is a solution using data.table and reshape2:

# Load libraries
library(data.table)
library(reshape2)

# Setup new list object 
my_list.2 <- vector(length(my_list), mode="list")

# Add time stamps as variable and add ID variable
for(i in 1:length(my_list)){ 
  my_list.2[[i]] <- cbind(time=rownames(my_list[[i]]), my_list[[i]], id=rep(paste0("list_",i), id=nrow(my_list[[i]]))) 
}

# Collapse all lists in one data table
d.temp <- rbindlist(my_list.2)

# Transform the data
d.final <- dcast(time~id, value.var="PX_LAST", data=d.temp)


# > d.final
#                   time list_1 list_2 list_3 list_4
# 1  2014-04-28 00:00:00  0.398     NA 0.4810 1.6500
# 2  2014-04-29 00:00:00  0.457 1.7310 0.4560 1.6400
# 3  2014-04-30 00:00:00  0.400 1.7060 0.4480 1.6500
# 4  2014-05-02 00:00:00  0.159 1.6900 0.4360 1.6595
# 5  2014-05-05 00:00:00  0.126 1.7130 0.4480 1.6650
# 6  2014-05-06 00:00:00  0.108 1.7110 0.4580 1.6595
# 7  2014-05-07 00:00:00  0.260 1.7240 0.4660 1.6625
# 8  2014-05-08 00:00:00  0.239 1.6990 0.4320 1.6520
# 9  2014-05-09 00:00:00  0.222 1.7020 0.4370 1.6450
# 10 2014-05-12 00:00:00  0.191 1.7050 0.4410 1.6245
# 11 2014-05-13 00:00:00  0.184 1.6490 0.4170 1.6270
# 12 2014-05-01 00:00:00     NA 1.7095 0.4390 1.6620
# 13 2014-05-14 00:00:00     NA 1.6110 0.4035 1.6330
# 14 2014-04-25 00:00:00     NA     NA     NA 1.6500
Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top