Question

I have two dataframes

#df1
type <- c("A", "B", "C")
day_start <- c(5,8,4)
day_end <- c(12,10,11)
df1 <- cbind.data.frame(type, day_start, day_end)
df1
  type day_start day_end
1    A         5      12
2    B         8      10
3    C         4      11

#df2
value <- 1:10
day <- 4:13
df2 <- cbind.data.frame(day, value)
   day value
1    4     1
2    5     2
3    6     3
4    7     4
5    8     5
6    9     6
7   10     7
8   11     8
9   12     9
10  13    10

I would like to subset df2 such that each level of factor "type" in df1 gets its own dataframe, only including the rows/days between day_start and day_end of this factor level.

Desired outcome for "A" would be..

list_of_dataframes$df_A
   day value
1    5     2
2    6     3
3    7     4
4    8     5
5    9     6
6   10     7
7   11     8
8   12     9

I found this question on SO with the answer suggesting to use mapply(), however, I just cannot figure out how I have to adapt the code given there to fit my data and desired outcome.. Can someone help me out?

Était-ce utile?

La solution

The following solution assumes that you have all integer values for days, but if that assumption is plausible, it's an easy one-liner:

> apply(df1, 1, function(x) df2[df2$day %in% x[2]:x[3],])
[[1]]
  day value
2   5     2
3   6     3
4   7     4
5   8     5
6   9     6
7  10     7
8  11     8
9  12     9

[[2]]
  day value
5   8     5
6   9     6
7  10     7

[[3]]
  day value
1   4     1
2   5     2
3   6     3
4   7     4
5   8     5
6   9     6
7  10     7
8  11     8

You can use setNames to name the dataframes in the list:

setNames(apply(df1, 1, function(x) df2[df2$day %in% x[2]:x[3],]),df1[,1])

Autres conseils

Yes, you can use mapply:

Define a function that will do what you want:

fun <- function(x,y) df2[df2$day >= x & df2$day <= y,]

Then use mapply to apply this function with every element of day_start and day_end:

final.output <- mapply(fun,df1$day_start, df1$day_end, SIMPLIFY=FALSE)

This will give you a list with the outputs you want:

final.output

[[1]]
  day value
2   5     2
3   6     3
4   7     4
5   8     5
6   9     6
7  10     7
8  11     8
9  12     9

[[2]]
  day value
5   8     5
6   9     6
7  10     7

[[3]]
  day value
1   4     1
2   5     2
3   6     3
4   7     4
5   8     5
6   9     6
7  10     7
8  11     8

You can name each data.frameof the list with setNames:

final.output <- setNames(final.output,df1$type)

Or you can also put an attribute type on the data.frames of the list:

fun <- function(x,y, type){
  df <- df2[df2$day >= x & df2$day <= y,]
  attr(df, "type") <- as.character(type)
  df
}

Then each data.frame of final.output will have an attribute so you know which type it is:

final.output <- mapply(fun,df1$day_start, df1$day_end,df1$type, SIMPLIFY=FALSE)

# check wich type the first data.frame is 
attr(final.output[[1]], "type")
[1] "A"

Finally, if you do not want a list with the 3 data.frames you can create a function that assigns the 3 data.frames to the global environment:

fun <- function(x,y, type){
  df <- df2[df2$day >= x & df2$day <= y,]
  name <- as.character(type)
  assign(name, df, pos=.GlobalEnv)
}

mapply(fun,df1$day_start, df1$day_end, type=df1$type, SIMPLIFY=FALSE)

This will create 3 separate data.frames in the global environment named A, B and C.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top