Question

I have two data frames that contain time series, unfortunately one has fewer element. I would like to extract the rows of the longer data frame for which there exists an entry with the corresponding date in the shorter data frame.

My idea was to simply get the Date values of the shorter data frame in a vector and somehow use that vector inside an indexing condition, so that only the entries of the longer data frame for which the Date entry is included in the Date vector in the indexing condition are selected. Unfortunately I was unable to find a way to use the vector in a boolean condition and the only solution that I have found involved writing a slow for-loop. Can anybody tell me how to do this without a for-loop?

Was it helpful?

Solution

Consider the following example. you have two data.frames df1 and df2, where df1 has more rows than df2.

date <- seq(as.Date("2014-01-01"),as.Date("2014-01-10"), 1)
value <- 20:29
df1 <- data.frame(date,value)

df1
#         date value
#1  2014-01-01    20
#2  2014-01-02    21
#3  2014-01-03    22
#4  2014-01-04    23
#5  2014-01-05    24
#6  2014-01-06    25
#7  2014-01-07    26
#8  2014-01-08    27
#9  2014-01-09    28
#10 2014-01-10    29

date <- seq(as.Date("2014-01-01"),as.Date("2014-01-10"), 2)
value2 <- 5:1
df2 <- data.frame(date, value2)

df2
#        date value2 
#1 2014-01-01      5
#2 2014-01-03      4
#3 2014-01-05      3
#4 2014-01-07      2
#5 2014-01-09      1

To extract all the rows of df1 which have corresponding date entries in df2 you can use the %in% operator:

df1[df1$date %in% df2$date, ]

#        date value
#1 2014-01-01    20
#3 2014-01-03    22
#5 2014-01-05    24
#7 2014-01-07    26
#9 2014-01-09    28

Most likely, this will work the same for your data.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top